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detector  for  a  constant  signal  in' independent,  identically  dis- 
tributed  noise.  A  definition  for  non-Gaussian  noise  is  given, 
several  common  univariate  density  models  are  exhibited,  and 
some  physical  non- Gaussian  noise  data  is  discussed. 

Two  approaches  in  designing  adaptive  detector  nonlineari¬ 
ties  are  presented,  where  it  is  assumed  that  the  noise  statis¬ 
tics  are  approximately  stationary.  Both  proposals  utilize 
simple  measurements  of  the  noise  behavior  to  adapt  the  detector, 
and  in  several  examples  the  adaptive  detectors  are  shown  capable 
of  attaining  nearly  optimal  performance  levels.  A  simulation 
is  presented  demonstrating  their  successful  application. 

The  physical  noise  data  is  examined,  and  found  to  be  con¬ 
taminated  with  impulsive  noise  having  a  burst-like  structure. 
This  observation  suggests  that  a  nonstationary  noise  model  and 
a  time-varying  detector  may  be  appropriate.  A  nonparametric 
structure  is  proposed  to  detect  the  presence  of  impulsive 
bursts,  and  the  performance  of  the  detection  algorithm  is  eval- 
^^.ted .  It  is  then  shown  how  information  provided  by  the  burst 
detector  may  be  used  to  advantage  in  a  signal  detector.  Per¬ 
formance  of  the  combined  detector  structure  is  analyzed  and 
found  to  be  superior  relative  to  the  performance  of  any  single 
fixed  detector  structure  in  certain  noise  environments.  A 
simulation  of  the  proposed  structures  is  presented  arid  compared 
to  the  simulation  of  the  previous  adaptive  detectors. 

The  .problem  of  approximating  known  locally  optimal  detec- 
tor  nonlinearities  is  examined  and  shown  to  be  equivalent  to 
the  minimum  mean  square  error  approximation  of  known  nonlin¬ 
earities  . 

A  performance  index  for  comparing  the  performance  of  sub- 
optimal  threshold  detectors  operating  with  constant  false 
alarm  rates  is  proposed  and  analyzed.  The  ratio  of  indices 
for  two  detectors  is  shown  to  have  appealing  and  useful  proper¬ 
ties  in  studying  non-zero  signal  to  noise  ratio  detection 
problems . 
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ABSTRACT 

This  dissertation  addresses  the  problem  of  finding  nearly 
optimal  detector  structures  for  non-Gaussian  noise  environments. 
It  is  assumed  that  the  noise  statistics  are  unknown  except  for 
a  very  loose  characterization.  Under  this  condition,  the  goal 
is  to  study  adaptive  detector  structures  that  are  simple,  yet 
capable  of  high  levels  of  performance. 

Attention  is  focused  on  the  discrete-time  locally  optimal 
detector  for  a  constant  signal  in  independent,  identically  dis¬ 
tributed  noise.  A  definition  for  non-Gaussian  noise  is  given, 
several  common  univariate  density  models  are  exhibited,  and 
some  physical  non-Gaussian  noise  data  is  discussed. 

Two  approaches  in  designing  adaptive  detector  nonlinearities 
are  presented,  where  it  is  assumed  that  the  noise  statistics  are 
approximately  stationary.  Both  proposals  utilize  simple  measure¬ 
ments  of  the  noise  behavior  to  adapt  the  detector,  and  in  several 
examples  the  adaptive  detectors  are  shown  capable  of  attaining 
nearly  optimal  performance  levels.  A  simulation  is  presented 
demonstrating  their  successful  application. 

The  physical  noise  data  is  examined,  and  found  to  be  contam¬ 
inated  with  impulsive  noise  having  a  burst-like  structure.  This 


observation  suggests  that  a  nonstationary  noise  model  and  a  time- 
varying  detector  may  be  appropriate.  A  nonparametric  structure 
is  proposed  to  detect  the  presence  of  impulsive  bursts,  and  the 
performance  of  the  detection  algorithm  is  evaluated.  It  is  then 
shown  how  information  provided  by  the  burst  detector  may  be  used 
to  advantage  in  a  signal  detector.  Performance  of  the  combined 
detector  structure  is  analyzed  and  found  to  be  superior  relative 
to  the  performance  of  any  single  fixed  detector  structure  in  cer¬ 
tain  noise  environments.  A  simulation  of  the  proposed  structures 
is  presented  and  compared  to  the  simulation  of  the  previous  adap¬ 
tive  detectors. 

The  problem  of  approximating  known  locally  optimal  detector 
nonlinearities  is  examined  and  shown  to  be  equivalent  to  the 
minimum  mean  square  error  approximation  of  known  nonlinearities. 

A  performance  index  for  comparing  the  performance  of  subop- 
timal  threshold  detectors  operating  with  constant  false  alarm 
rates  is  proposed  and  analyzed.  The  ratio  of  indices  for  two 
detectors  is  shown  to  have  appealing  and  useful  properties  in 
studying  non-zero  signal  to  noise  ratio  detection  problems. 
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Introduction 


1.  Motivation 

Extracting  information  from  raw  data  in  the  presence  of  noise  is  the 
ubiquitous  problem  of  communication  theory,  and  there  are  countless 
variations  on  this  fundamental  theme.  In  some  contexts,  it  is  important 
to  estimate  a  signal  or  its  parameters.  In  other  contexts,  it  is  desired  to 
detect  which,  if  any,  signal  is  present.  Both  problems  have  received  con¬ 
siderable  theoretical  and  practical  attention. 

In  this  thesis,  a  very  simple  detection  problem  is  posed:  Detect  the 
presence  (or  absence)  of  a  known  constant-level  signed  in  a  sequence  of 
observations  that  is  corrupted  by  addition  of  a  sequence  of  observations 
from  a  random  noise  process.  The  problem  is  further  simplified  by 
assuming  that  the  noise  observations  are  statistically  independent  of 
each  other  and  the  signal.  In  several  cases,  it  is  further  assumed  that  all 
noise  observations  are  identically  distributed. 
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In  spite  of  this  apparent  simplicity,  there  remain  some  important 
issues:  Namely,  how  does  one  approach  the  detection  problem  in  non- 
Gaussian  noise  environments  when  the  noise  statistics  are  only  partially 
known?  What  considerations  are  important  in  the  design  of  a  detection 
algorithm  if  the  goal  is  to  achieve  a  high  level  of  performance  with  simple 
adaptive  structures?  How  may  a  detector  recognize  an  abrupt  change  in 
the  noise  and  abate  its  effects?  What  properties  should  be  possessed  by  a 
"good’'  procedure  for  approximating  optimal  detectors?  How  may  the 
finite  sample  efficiency  of  a  suboptimal  detector  be  characterized? 

2.  Outline  of  the  Thesis 

This  thesis  has  been  written  as  one  approach  in  addressing  these  and 
similar  issues.  The  orientation  of  the  work  is  not  directed  toward  purely 
theoretical  ends,  nor  is  it  purely  an  application  of  known  results.  Rather, 
it  attempts  to  combine  elements  of  both  areas  The  previous  questions 
are  studied  in  the  combined  light  of  abstract  and  practical  considera¬ 
tions.  Thus,  the  results  which  will  be  presented  range  from  theorems  and 
proofs  to  numerical  simulations  of  proposed  systems. 

Chapter  2 

Chapter  2  introduces  the  detection  problem  which  is  common  to  all 
chapters.  Specifically,  the  Neyman-Pearson  and  locally  optimal  detectors 
are  discussed,  along  with  a  description  of  their  particular  performance 
measures. 

It  is  obvious  that  every  density  family,  save  for  one,  comprise  non- 
Gaussian  densities.  These  densities  often  characterize  the  noise  in  phy- 
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sical  situations  where  classical  assumptions  leading  to  a  Gaussian  noise 
model  are  violated,  and  are  of  considerable  practical  interest.  An  impli¬ 
cit  assumption  in  many  cases  is  that  the  non-Gaussian  densities  deviate 
from  the  nominal  Gaussian  model  in  a  particular  way;  most  importantly, 
they  are  heavy-tailed  relative  to  the  Gaussian  density.  An  explicit  char¬ 
acterization  for  these  densities  is  given  in  the  chapter,  and  several  useful 
non-Gaussian  univariate  density  models  are  exhibited.  The  chapter  con¬ 
cludes  with  a  discussion  of  the  importance  of  recognizing  the  effects  of 
heavy-tailed  noise  and  its  impact  on  the  detection  problem  as  seen  in 
previous  work. 

The  appendix  also  introduces  some  non-Gaussian  physical  noise  data 
which  is  used  later  in  the  thesis  for  simulation  studies. 

Chapter  3 

The  thrust  of  Chapter  3  is  to  consider  the  design  of  simple  detector 
structures.  It  is  assumed  that  little  is  known  about  the  non-Gaussian 
noise  environment,  and  that  the  goal  is  to  design  detectors  with  very  sim¬ 
ple  structures  and  adaptation  algorithms.  Two  alternative  approaches 
are  proposed:  one  is  an  "open  loop"  procedure  where  the  observed  noise 
density  tails  are  characterized,  and  this  information  is  used  to  update 
the  detector  structure.  The  other  approach  is  a  "closed  loop"  procedure 
where  a  very  simple  detector  nonlinearity,  a  three-sectioned  piecewise 
linear  function,  is  proposed.  An  adaptive  algorithm  is  then  developed  for 
finding  the  optimum  nonlinearity  parameters. 

The  performance  of  the  two  alternative  structures  and  adaptation 
algorithms  is  examined  under  some  known  non-Gaussian  noise  density 
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models,  as  well  as  in  a  simulation  using  physical  noise  data  to  drive  the 
algorithm. 

Chapter  4 

Chapter  4  also  proposes  a  detector  structure  for  a  non-Gaussian 
noise  environment,  but  assumes  a  different  philosophy.  There  it  is 
observed  that  stationary  noise  models  may  be  inappropriate  when  the 
noise  source  contains  bursts  of  impulsive  noise;  i.e.,  the  impulse  produc¬ 
ing  event  is  short  and  well  delineated  in  the  noise  observation  sequence. 
Recognizing  this  fact,  a  nonstationary  model  for  the  noise  is  proposed, 
and  a  time  varying  detector  structure  is  designed  that  capitalizes  upon 
the  ability  of  a  subsidiary  detector  to  recognize  the  presence  of  impulsive 
bursts.  An  algorithm  for  the  subsidiary  noise  burst  detector  is 
developed  using  a  nonparametric  approach.  The  performance  of  the  time 
varying  detector  and  of  the  noise  burst  detector  is  examined  in  detail, 
and  the  physical  noise  data  again  is  used  to  simulate  the  detection  sys¬ 
tem. 

Chapter  5 

Answered  in  Chapter  5  is  a  question  hinted  at  earlier  in  Chapter  3: 
what  is  the  "best"  way  in  which  to  approximate  a  known  locally  optimal 
detector  structure?  The  term  "best  way"  is  interpreted  as  meaning  the 
procedure  yielding  an  approximation  having  the  greatest  efficacy,  and  a 
theorem  is  proven  showing  that  the  answer  turns  out  to  be  any  procedure 
that  minimizes  mean  square  error  relative  to  the  density  induced  meas¬ 
ure.  Implications  of  this  theorem  are  discussed  and  its  application  is 
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illustrated  in  two  examples. 

Chapter  6 

Throughout  the  thesis,  concern  is  placed  on  locally  optimal  detection 
and  the  performance  measure  of  efficacy.  Chapter  6  changes  course  and 
examines  the  finite  sample  size  performance  of  detectors  which  approxi¬ 
mate  the  Neyman-Pe arson  structure.  While  closely  related  to  previous 
work  on  performance  bounding,  this  new  work  does  not  assume  that  the 
detector  test  statistic  is  generated  by  a  likelihood  ratio  of  the  exact  or 
approximate  hypothesis  densities.  Bounds  on  the  error  probabilities  are 
combined  to  form  a  single  performance  index,  and  several  theorems 
establish  its  properties.  The  ratio  of  two  indices  is  designated  as  the  rela¬ 
tive  bound  efficiency,  which  is  shown- to  have  a  useful  interpretation.  The 
finite  sample  performance  of  some  well  known  detectors  is  examined 
using  relative  bound  efficiency. 

Chapter  7 

The  results  presented  in  the  thesis  are  summarized  in  Chapter  7,  and 
some  suggestions  for  further  study  are  made. 


2 


Signal  Detection 
and  the  Non- Gaussian 
Noise  Environment 


Although  diverse  in  purpose  and  form,  radar,  sonar,  and  data  com¬ 
munication  systems  have  at  their  heart  a  common  important  problem: 
detection  of  a  signal  in  a  noisy  environment.  This  problem  has  received 
considerable  attention  in  both  the  engineering  and  statistical  literature, 
with  viewpoints  ranging  from  concrete  details  to  abstract  theory. 

The  purpose  of  this  chapter  is  not  to  provide  a  thorough  review  of  the 
detection  problem,  or  of  the  noise  environment  modeling  problem. 
Rather,  this  chapter  is  intended  only  to  provide  a  common  ground  from 
which  some  particular  problems  in  detection  theory  may  be  viewed; 
therefore  mathematical  rigor  is  suppressed  for  the  sake  of  compactness. 
Complete  exposition  of  the  theory  is  available  from  the  cited  references. 
Section  1  provides  a  short  introduction  to  the  detection  problem  and  the 
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theoretical  foundations  upon  which  the  remainder  of  the  thesis  will  rest. 
Specifically,  the  Neyman-Pearson  and  locally  optimal  detector  structures 
are  introduced  with  their  associated  performance  measures.  In  Section 
2,  note  is  made  of  a  particular  type  of  noise  environment  which  will  be  of 
concern  in  this  thesis,  and  some  noise  models  are  discussed.  The  notion 
of  a  non-GazLssicm  density  is  developed  to  the  degree  necessary  to  give  it 
a  characterization.  Finally,  Section  3  discusses  the  impact  of  non- 
Gaussian  noise  on  the  detection  problem  and  summarizes  some  results 
which  are  background  and  motivation  to  the  approaches  m  this  thesis. 
The  Appendix  outlines  the  characteristics  of  some  physical  noise  data 
which  is  used  later  in  the  thesis  to  drive  various  simulations. 

1.  Detector  Structures  and  Performance  Measures 

Neyman-Pearson  Detector  Structure 

A  binary  hypothesis  test  may  be  used  to  model  the  problem  of 
detecting  a  known  signal  in  the  presence  of  noise.  Consider  the  following 
detection  problem  in  discrete  time  over  a  signaling  interval  of  length  M. 
Let  Os  =  0{slt  .  .  .  , sy\  be  a  known  signal  sequence  with  amplitude 

parameter  0  >  0,  and  let  n  =  frij . nM  \  be  an  independent  identically 

distributed  (iid)  noise  sequence  independent  of  the  signal.  Section  2  will 
provide  justification  for  the  iid  restriction  on  the  noise.  The  detector 
observes  x,  a  data  sequence  Jzj,  .  .  .  ,xy and  decides  between: 

II q.  x  =  n 
Hi',  x  =  n+(9s 


Here,  without  loss  of  generality,  we  restrict  ourselves  to  the  special  case 
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of  distinguishing  between  the  two  signals  s0=  0  and  s,  =  s.  In  the  frame¬ 
work  of  Neyman-Pearson  (NP)  hypothesis  testing  [1-3],  the  observation  x 
and  the  multivariate  noise  density  /  M  are  used  to  calculate  a  likelihood 
ratio  A^rp.  This  test  statistic  and  a  fixed  threshold  T/fp  are  compared  to 
arrive  at  a  decision:  H j  is  chosen  when  A >  7//p,  and  Hq  is  chosen  when 
A NP  <  T^p-  More  precisely, 


A  np 


/fft(*) 

Sh „(x) 


Hx 

/N(x-gs)  > 

/  n(x)  § 

no 


Tnp 


(2.1) 


u 

Since  the  noise  is  iid,  /  jj{n)  =  J}/  {nf)  where  /  is  the  univariate  density 

i  =  1 

of  the  noise.  For  the  sake  of  brevity ,  in  the  remainder  of  this  thesis  we 
adopt  the  convention  that  f  (•)  is  the  univariate  noise  density  unless 
explicitly  stated  otherurise.  Because  the  logarithm  function  is  monotonic 
a  test  equivalent  to  (2.1)  is 


H  i 

A  np  -  hiAtfp  >  tNP  =  In  I]vp  (2.2) 

H0 


where 


M  M  fix-  — 1 

*np  =  S ffiWPst^)  =  S  ^  (2-3) 

i=l  i=l  J  \xi) 

The  function  g^p-i  is  memoryless,  but  time  varying  because  the  signal 
varies  with  time.  Consideration  of  the  time  varying  signal  case  adds  noth¬ 
ing  beyond  unnecessary  complication  to  the  essence  of  this  discussion. 
Therefore,  we  will  sacrifice  completeness  for  clarity,  restrict  attention  to 
the  constant  signed  Sj=s  for  i  =  1 . M,  and  replace  the  sequence 


\9 ffP-.il  with  a  memoryless  nonlinearity,  gNP.  Figure  2.1  presents  a  block 
diagram  of  the  NP  detector  structure  generated  by  (2.2)  and  (2.3). 


Neyman-Pearson  Detector  Performance 

The  performance  of  a  Neyman-Pearson  detector  is  usually  measured 
in  terms  of  its  false  alarm  rate  a  and  its  power  of  detection  0  These 
quantities  are  defined  as 

a  =  Prob(say  H j  j  H0  true) 

0  =  Prob(say  Hx  |  Hx  true) 

These  probabilities  are  determined  by  the  distribution  of  XNP  under  H0 
and  Hlt  respectively,  and  the  value  of  the  threshold  tNP.  Thus, 


(2  4) 


(2.5) 


The  Neyman-Pearson  detector  is  optimal  [1-3]  in  the  sense  that,  for 
any  given  false  alarm  rate  a0,  the  NP  test  achieves  the  maximum  proba¬ 
bility  of  detection  /S  in  the  set  of  all  possible  tests  with  a  <,  a0. 

The  performance  measures  a  and  (3  are  not  restricted  to  the  charac¬ 
terization  of  NP  tests  only.  The  performance  of  any  threshold  detection 
scheme  may  be  parameterized  via  (2.4)  and  (2  5).  In  principle,  if  the 
noise  density  /  is  known,  and  a  known  nonlinearity  g  processes  the 
observations,  then  pPo  and  pHx  may  be  computed  via  transformation  of 
the  hypothesis  densities  in  the  case  where  x  is  a  single  observation  and 
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M-  1.  For  multiple  observations,  M>  1,  and  pHl}  and  pxi  may  be  com¬ 
puted  via  //-fold  convolutions  of  the  transformed  hypothesis  densities,  as 
is  well  known.  See,  for  example,  references  [3  pp.  215-219]  and  [4].  In 
other  instances,  carrying  out  this  procedure  may  be  a  difficult  or  intract¬ 
able  problem,  especially  when  M  is  large.  One  must  then  resort  to  calcu¬ 
lating  a  and  /S  via  Monte  Carlo  simulations  [5,6],  series  expansions  [3,  pp. 
219-226]  and  [7],  numerical  approximation  methods  [8,9,49],  perfor¬ 
mance  bounding  [2,  pp.  116-133]  and  [10],  or  the  Central  Limit  Theorem 
[11  pp.  308-319]  and  [4]. 

The  measures  a  and  /?  are  often  inconvenient  to  compute,  even 
though  they  give  a  complete  characterization  of  detector  performance. 
Small  changes  in  the  detector  nonlinearity  or  noise  density  may  change  a 
tractable  computation  into  an  intractable  problem.  Further,  with  the 
exception  of  Central  Limit  Theorem  based  techniques,  most  methods  give 
little  qualitative  or  quantitative  insight  into  understanding  how  changes 
in  the  threshold  or  sample  size  affect  performance.  All  of  the  techniques 
offer  little  illumination  of  performance  sensitivity  as  a  function  of 
changes  in  the  nonlinearity  shape. 


Locally  Optimal  Detector  Structure 

In  cases  where  the  signal-to-noise  ratio  is  very  small,  0^0,  and  the 
test  statistic  may  be  calculated  via  the  locally  optimal  (LO)  detector 
[12,13],  The  test  becomes 


u 

\ LO  ~  E  9lO\i(xi) 

i  =  1 


H  i 

<  ho 

H0 


(2.6) 
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where 


/  \  d  ,  /  (Xi-esi) 

9lo*M  =  5g-to-J(i) 


0  =  0 


/'(gj) 
/(*t)  * 


(2.7) 


Simply  put,  is  the  coefficient  of  6  in  the  Taylor  series  expansion  of 
9/fP;i  about  0  =  0.  Despite  the  fact  that  s  may  be  time  varying,  the 
transformation  operating  on  xt  is  not  a  function  of  i.  Instead,  (2.6)  and 
(2.7)  imply  that  the  output  of  a  single  memoryless  nonlinearity  gio 
should  be  correlated  with  the  signal.  Once  again,  to  simplify  the  discus¬ 
sion,  we  restrict  the  signal  to  be  constant,  rescale  the  test  statistic  by  s, 
and  limit  our  efforts  to  consideration  of  gL0  =  -/'//.  If  gL0  is  substi¬ 
tuted  for  gifp,  then  Figure  2.1  also  represents  a  block  diagram  of  the  LO 
detector  structure  generated  by  (2  6)  and  (2.7) 


Locally  Optimal  Detector  Performance 


Rather  than  derive  rigorously  the  performance  measures  of  efficacy 
and  asymptotic  relative  efficiency  for  locally  optimal  detectors,  we  briefly 
summarize  some  of  the  important  points  of  these  useful  measures.  A 
thorough  treatment  of  this  subject  is  available  in  [3,12-14], 

The  Neyman-Pearson  detector  is  optimal  in  the  sense  that  for  given 
a,  it  maximizes  /3  when  the  signal  amplitude  6  is  nonzero.  The  locally 
optimal  detector,  on  the  other  hand,  is  optimal  in  the  sense  that,  for 


given  a,  it  maximizes 


de 


■P(e) 


9  =  0 


Zero  signal  strength  is  obviously  a 


limiting  worst  case.  A  useful  way  of  comparing  two  detectors  in  this  lim¬ 
iting  case  is  to  compute  their  ARE,  or  asymptotic  relative  efficiency, 
where, 
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ARE 


9\9z 


Mz(a,P,e) 

M^cx.p.e) 


(2-8) 


and  M(a,l3,& )  is  interpreted  as  the  number  of  data  observations  neces¬ 
sary  to  provide  a  (a,0)  performance  level  for  signal  level  6.  As  seen  in 
(2.8),  asymptotic  relative  efficiency  is  the  ratio  of  the  number  of  samples 
necessary  in  each  of  two  alternative  detectors  to  maintain  the  same  pro¬ 
babilities  of  false  alarm  and  detection  as  the  signal-to-noise  ratio 
approaches  zero.  Regularity  conditions  [13]  ensure  that  as  6  -»0,  both  M\ 
and  Mz  -*  thus,  ARE  is  a  small  signal,  large  sample  size  measure  of  per¬ 
formance. 

A  simple  expression  is  available  to  compute  ARE,  and  is  defined  as 


ARE^ 


b9z 


rtf  (9z) 


(2-9) 


where  r)f(g)  is  the  efficacy  of  detector  g  in  noise  density  /  It  may  be 
shown  [3,  p.  228;  13],  for  iid  noise  and  a  fixed  detector  nonlinearity  g , 


that 


30 


2 


fg'{x)f  (x)dx 


7?/(s()  =  -4 - - 

fg2(x)f  {x)dx 

— oo 


(2.10) 


where  g  has  zero  mean  under  / .  This  definition  of  efficacy  is  subject  to 
the  following  regularity  conditions  in  a  neighborhood  of  6  =  0: 


(i) 


var^Aj, 


is  asymptotically  normal  •with  zero  mean  and  unity 


variance.  Here,  Xg  is  the  test  statistic  of  a  detector  using 
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memoryless  nonlinearity  g . 

(ii)  6  =  kM “«  where  A:  is  a  nonzero  constant. 

(i£i)So  =Eh^' 

The  LO  detector  maximizes  efficacy  [13];  as  a  result  ARE  ^  1  for 

9-9lo 

all  detectors  g  satisfying  the  regularity  conditions.  Note  that  (2.10)  is  a 
ratio  of  two  expectations,  and  is  therefore  usually  more  convenient  to 
compute  than  an  M-iold  convolution  of  a  probability  density. 

A  criticism  of  ARE  is  that  it  is  a  performance  measure  based  upon  a 
limiting  case.  While  ARE  measures  asymptotic  performance,  it  may  not 
give  a  good  indication  of  the  relative  merits  of  two  detectors  in  a  small 
sample,  nonzero  signal  environment.  Some  recent  work  [15,16,50]  has 
concentrated  on  examining  the  convergence  of  relative  efficiency  to  ARE. 
Further,  because  efficacy  is  only  a  ratio  of  two  expectations,  it  may  be 
argued  that  two  detectors  with  identical  efficacies  need  not  have  similar 
small  sample  performance. 

Despite  these  criticisms,  ARE,  efficacy,  and  the  LO  detector  shall 
receive  the  most  attention  in  this  thesis  for  several  reasons.  First, 
efficacy  is  a  convenient,  accepted,  and  well  studied  measure  of  perfor¬ 
mance.  Second,  small  signal  detection  is  an  important  problem,  and  the 
zero  (or  infinitesimal)  signal  is  the  limiting  bound.  The  NP  and  LO  detec¬ 
tors  are  asymptotically  equivalent  in  the  limiting  case,  even  though  LO 
detection  is  not  optimal  for  a  nonzero  signal  [  13, 5 1  j .  Third,  for  very  small 
signals  there  is  a  close  correspondence  between  the  forms  of  the  NP  and 
LO  detectors,  which  suggests  that  by  paying  attention  to  the  LO  detection 


problem,  it  should  be  possible  to  gain  insight  into  the  issue  of  NP  detec¬ 
tor  design.  The  relationship  between  the  nonlinearities  for  the  two  types 
of  detectors  is 


(2.11) 


and,  from  [17] 


(2.12) 


x-0s 


The  latter  equation  implies  that  if  6  is  small,  and  if  gL0  is  approximately 
constant  over  the  range  of  integration,  then  g^p  w  ^sgL0. 


2.  The  Non-Gaussian  Noise  Environment 


General  Assumptions 

Before  discussing  some  noise  models  of  interest  later  in  the  thesis,  it 
is  necessary  to  state  some  of  the  fundamental  assumptions  about  the 
noise  environment  and  the  models  which  will  be  used.  First,  as  stated  in 
the  previous  section,  we  are  interested  in  the  discrete  time  environment. 
Second,  we  assume  that  the  noise  samples  are  independent  and  identi¬ 
cally  distributed.  This  is  a  very  strong  assumption  with  extensive  and 
rigorous  requirements  on  the  noise  behavior,  but  it  allows  simplification 
of  the  analysis.  For  example,  the  difficult  problem  of  modeling  depen¬ 
dent  non-Gaussian  noise  is  eradicated  by  the  independence  assumption. 
Further,  a  noise  with  a  nonstationary  distribution  implies  that  time  vary¬ 
ing  detector  structures  are  necessary,  which,  in  general,  may  be  quite 
difficult  to  specify  and  implement. 
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What  is  of  more  interest  than  strict  mathematical  fulfillment  of  the 
iid  requirement  is  that  the  assumption  be  approximately  true  for  the 
physical  case  of  interest.  Even  though  noise  is  usually  correlated  due  to 
the  finite  bandwidth  of  a  channel,  adjacent  samples  may  be  approxi¬ 
mately  independent,  provided  the  sampling  rate  is  low  enough.  The  noise 
environment  is  always  nonstationary,  as  no  real  source  has  unchanging 
statistics  for  the  infinite  past  and  the  infinite  future.  Over  finite  inter¬ 
vals,  however,  the  statistics  may  appear  stationary,  or  the  noise  statistics 
may  be  changing  slowly  enough  that  they  appear  approximately  station¬ 
ary  and  may  be  tracked  by  an  adaptive  system 

To  provide  a  starting  point,  then,  it  is  not  unreasonable  to  assume  iid 
noise.  This  assumption  is  a  divergence  from  the  reality  of  physical  noise 
environments,  but  for  that  price  clarity  and  mathematical  simplicity  are 
purchased.  An  implication  of  this  assumption  is  that  the  noise  environ¬ 
ment  is  described  adequately  by  a  univariate  density. 

There  is  an  abundance  of  information  on  the  measured  statistics  of 
physical  noise  sources,  a  full  report  on  which  is  beyond  the  intended 
scope  of  this  chapter.  Instead,  as  the  emphasis  is  on  understanding  the 
problem  of  finding  near-optimal  detectors  for  non-Gaussian  noise,  the  fol¬ 
lowing  subsections  present  some  common  noise  models  which  will  be 
used  in  the  following  chapters.  For  convenience,  the  noise  densities  mill 
be  assumed  here  to  be  zero  mean  and  unit  variance . 

Gaussian  Noise  Model 

A  Gaussian  noise  background  is  the  classical  assumption  in  the 
design  and  analysis  of  detection  systems  Here,  the  univariate  noise 
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density  is  the  well  known  expression 


(2.13) 


which  leads  to  the  LO  nonlinearity 


gL0{x)  =  i 


(2.14) 


For  convenience,  (2.14)  will  be  referred  to  as  the  linear  detector,  Id 

The  Gaussian  assumption  has  attractive  features  in  that  it  is 
mathematically  tractable  and  the  optimal  detector  structure  is  a  linear 
processor.  Strong  justification  for  this  noise  model  is  available  due  to 
Central  Limit  Theorem  (CLT)  arguments,  for  at  least  two  reasons:  first, 
the  noise  source  often  may  be  considered  as  a  shot  noise  process 
comprising  a  very  large  number  of  small  effects  with  additive  cumulative 
effect;  e.g.,  thermal  noise.  Second,  the  finite  bandwidth  of  many  chan¬ 
nels  "averages"  together  the  noise  process,  tending  to  make  the  noise  at 
the  channel  output  Gaussian.  In  the  limit,  as  the  channel  bandwidth 
approaches  zero,  it  may  be  shown  [18  Thm.  2.4]  that  the  noise  output 
process  of  a  narrowband  channel  converges  in  distribution  to  a  Gaussian 
process. 

Rebuttal  of  the  Gaussian  Model 

Despite  these  arguments,  measurements  of  different  noise  environ¬ 
ments  have  led  to  the  conclusion  that  the  true  noise  distribution  often  is 
described  better  by  a  heavier  tailed  pdf;  e.g.  [19-22,  25-28,30,35,36,48]. 
Also,  see  the  discussion  and  bibliographies  of  [17,23,32,33].  This  type  of 
noise  may  be  ascribed  to  a  nominal  Gaussian  environment  with  a  heavy 
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tailed  impulsive  noise  contaminant.  Another  consideration  is  that  a  real 
noise  comprises  a  finite  sum  of  random  events;  in  a  shot  process  there 
can  only  be  a  finite  number  of  contributors,  and  any  real  channel  has 
nonzero  bandwidth  The  result  is  that  CLT  convergence  to  the  Gaussian 
pdf  is  not  complete.  Instead,  the  observed  noise  pdf  is  most  nearly  Gaus¬ 
sian  near  the  mean,  with  the  tails  converging  to  the  Gaussian  pdf  only  in 
the  limiting  case. 


■J  771 

For  example  [24  p.  103],  if  Xm  =  i—  Y,  and  the  Yi  are  iid  random 

vm  *=1 

variables  with  continuous  distribution  function,  EF=0,  EF2=1,  and 


EF-O,  then  f°r  \xm\  ^^^nm  where  $  is  the  Gaussian 

distribution  function.  Convergence  of  the  sum  distribution  to  the  Gaus¬ 
sian  is  from  the  mean  outwards,  and  the  size  of  the  Gaussian-like  region 
is  proportional  to  V  lnm 


Contained  in  this  discussion  is  a  partially  constructive,  but  loose, 
characterization  of  non-Gaussian  noise  densities.  Obviously,  all  families 
of  densities  save  for  one  are  non-Gaussian.  However,  in  this  thesis  only 
particular  types  of  deviations  from  the  nominal  Gaussian  family  are  of 
concern.  The  term  non-Gaussian  noise  density  will  refer  to  unimodal, 
symmetric  densities  which  have  a  Gaussian-like  shape  in  a  region  cen¬ 
tered  about  the  mode.  These  densities  also  possess  tails  that  are  heavier 
than  the  Gaussian,  for  they  converge  to  zero  asymptotically  at  a  rate  less 
than  an  equal  variance  Gaussian  density.  This  type  of  density  is  often 
referred  to  as  being  heavy-tailed  or  Long-tailed. 
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Middleton’s  Class  A  and  B  Density  Models 

Noise  density  models  may  be  classified  into  one  of  two  categories: 
physically  motivated  models  and  empirical  models.  The  first  group  of 
models  take  into  consideration  physical  aspects  of  the  noise  situation  and 
attempt  to  describe  the  density  from  this  physical  accounting.  The 
second  group  of  models  use  convenient  distributions  which  seem  to  agree 
well  with  observed  characteristics  of  the  noise. 

Middleton’s  Class  A  and  Class  B  models  fall  into  the  category  of  physi¬ 
cally  motivated  models.  Without  exposition  of  the  details  found  in 
[20,25-28],  both  models  intend  to  characterize  situations  where  the  noise 
is  nominally  Gaussian  with  an  additive  impulsive  noise  component.  The 
Class  A  model  assumes  that  these  spikes  are  of  lesser  bandwidth  than  the 
receiver,  and  as  such,  do  not  generate  a  transient  response  of  significant 
duration  relative  to  the  spike  duration.  The  Class  B  model  assumes  the 
reverse,  and  the  spikes  produce  relatively  long  transients. 

The  Class  B  model  comprises  an  infinite  series  of  confluent  hyper- 
geometric  functions,  each  of  which  is  generally  defined  by  an  infinite 
series  [29  p.  504].  Because  of  its  unwieldiness,  we  will  not  consider  it 
further 

The  Class  A  Model  comprises  an  infinite  series  of  Gaussian  density¬ 
like  terms,  and  may  be  written  as 
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(2.15) 


The  parameter  A  is  called  the  overlap  index  and  is  the  product  of  the 
duration  of  individual  events  in  the  impulsive  component  and  the  mean 
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rate  of  the  shot  process  generating  the  impulsive  component  events.  The 
other  parameter  is  defined  as 

2  _  m/A+P 

m  i+r 


where  P  is  a  measure  of  the  ratio  of  the  power  in  the  impulsive  com¬ 
ponent  compared  to  the  power  in  the  Gaussian  background.  Both  param¬ 
eters  are  directly  related  to  physical  measurements  of  the  noise  environ¬ 
ment  [30]. 


Figures  2.2  -  2.5  compare  some  representative  unit  variance  Class  A 
densities  and  the  Gaussian  density.  The  Class  A  densities  have  Gaussian- 
like  behavior  near  x=0,  as  evidenced  by  their  parabolic  shape  on  the  log- 
scaled  plots.  For  large  x,  however,  they  have  a  much  heavier  tail 
behavior  than  the  Gaussian  density. 

The  L0  nonlinearity  associated  with  f  A  may  be  written  as 
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(2.16) 


Figures  2.6  and  2.7  compare  the  L0  nonlinearities  for  a  Gaussian  density 
and  the  Class  A  densities  of  the  previous  figures.  While  gA  has  nearly 
linear  behavior  for  small  \x  |,  the  effect  of  large  observations  is  greatly 
reduced  with  respect  to  a  linear  processor. 


The  Gaussian-Gaussian  c -mixture  Family 

Another  useful  interesting  class  of  noise  distributions  is  the 
Gaussian-Gaussian  e-mixture  family.  It  may  be.  written  as 


-21- 


-6  -4  -2  0  2  4  6 

x 

Fig.  2.2.  Representative  Middleton  Class  A  densities  f  ^  with 
A  =  .05  compared  to  Gaussian  density. 


Fig.  2.3.  Logarithm  of  Middleton  Class  A  densities  f  ^  with 
A  —  .05  compared  to  log  of  Gaussian  density. 
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Fig.  2.4.  Representative  Middleton  Class  A  densities  f  A  with 
i  '  =  .4  compared  to  Gaussian  density. 


Fig.  2.5.  Logarithm  of  Middleton  Class  A  densities  f  A  with 
P  =  .4  compared  to  log  of  Gaussian  density. 
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Fig.  2.6.  Locally  optimal  nonlinearity  gA  for  Middleton  Class 
A  densities  with  .4  =  .05  compared  to  unity  slope  linear 
detector  Id . 
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Pig.  2.7.  Locally  optimal  nonlinearity  gA  for  Middleton  Class 
A  densities  with  P  =  .4  compared  to  unity  slope  linear  detec¬ 
tor  Ld. 
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f  E(x)  =  (l-e)/o(*)+e/i(x)  (2.17) 


where  /0  and  f  j  are  both  zero  mean  Gaussian  densities,  with  0<  e  <  1  typ¬ 
ically  assuming  a  small  value  and  of  >  oq. 

The  LO  nonlinearity  associated  with  this  density  is 


gp _ gf 

(l-e)/o(*)+c/i(x) 


(2.18) 


Figures.  2.8  -2.11  compare  some  representative  Gaussian-Gaussian  c- 
mixture  densities  and  the  Gaussian  density.  Figures  2.12  and  2.13  com¬ 
pare  the  LO  nonlinearities  gE  to  a  linear  processor. 

The  density  f  E  is  attractive  in  that  it  is  a  relatively  simple  empirical 
model,  and  has  been  proposed  for  describing  heavy  tailed  non-Gaussian 
noise  [31,32].  Recently  [27,33],  it  was  shown  that  it  also  may  be  con¬ 
sidered  as  a  tractable  simplification  of  Middleton's  Class  A  Model  arising 
by  truncating  all  terms  of  fA  for  m  >  t.  The  parameters  of  fE  have  a  sim¬ 
ple  relationship  to  the  parameters  of  fA ,  given  here  as 


£  = 


A 

1 +A 


(2-19) 


1 

AT' 


(2.20) 


Therefore,  f  E  may  be  considered  a  quasi-physically  based  model.  Figure 
2.14  compares  the  LO  nonlinearity  for  fA  and  the  corresponding  LO  non¬ 
linearity  for  the  approximating  density  f  E. 

In  addition  to  a  Gaussian-Gaussian  mixture,  others  have  considered 
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Fig.  2.8.  Representative  Gaussian-Gaussian  c-mixture  densi¬ 
ties  f  e  with  £  =  .05  compared  to  Gaussian  density. 
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Fig.  2.9.  Logarithm  of  Gaussian-Gaussian  c-mixture  densities 
with  e  =  .05  compared  to  log  of  Gaussian  density. 
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Pig.  2.10.  Representative  Gaussian-Gaussian  e -mixture  den¬ 
sities  with  erf?  <7q  =  100  compared  to  Gaussian  density. 
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X 

Fig.  2.12.  Locally  optimal  nonlinearity  ge  for  Gaussian- 
Gaussian  e-mixture  densities  with  £  =  .05  compared  to  unity 
slope  linear  detector  id . 
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Fig.  2.13.  Locally  optimal  nonlinearity  gE  for  Gaussian- 
Gaussian  £-mixture  densities  with  erf/  erf  =  100  compared  to 
unity  slope  linear  detector  Ld. 
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Fig.  2.14.  Locally  optimal  nonlinearity^  for  Middleton  Class 
A  density  with  A  =  .1111  and  P  =  .0909  compared  to  locally 
optimal  nonlinearity  g  E  for  Gauss ian-Gaussian  £-mixture  den¬ 
sity  with  erf/ (7q  =  100  and  £  =  .  10.  These  values  satisfy  Eqns. 
(2.19)  and  (2.20). 
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difTerent  contaminants  of  a  nominal  Gaussian  background,  including 
Laplace  contamination  [23].  This  mixture  density  also  figures  impor¬ 
tantly  in  Huber’s  theory  of  robustness  [34],  where  the  contaminant  den¬ 
sity  is  merely  assumed  to  have  log-convex  shape. 

Laplace  Density 

This  density  is  also  known  as  the  double  sided  exponential  density, 
and  may  be  written  as 


(2.20) 


The  LO  detector  associated  with  the  Laplace  density  is 


gL(x)  =  sgn(i) 


(2.22) 


We  will  refer  to  (2.22)  as  the  sign  detector,  sd  The  Laplace  density  is  a 
convenient  model,  for  it  has  simple  form.  Measurements  on  ocean  acous¬ 
tic  data  suggest  that  the  Laplace  density  may  be  a  good  model  for  certain 
underwater  environments  [35,36],  While  the  density  clearly  has  tails 
heavier  than  the  Gaussian,  it  also  has  a  non-Gaussian-like  mode.  Instead 
of  the  smooth,  infinitely  differentiable  mode  of  the  Gaussian  density,  the 
Laplace  density  has  a  cusp.  Figure  2.15  compares  the  two  densities  for 
equal  variances. 

Generalized  Gaussian  Density 

This  family  of  densities  is  a  generalization  of  the  Gaussian  density 
and  may  be  written  as 


(2.23) 
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Rg.  2.15.  The  Laplace  density  fL  and  Gaussian  density 
compared  with  zero  me  sms  and  unit  variances. 
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where  the  parameter  7  is  defined  as 


r(3/c) 

r(l/c) 


a 


and  T  is  the  gamma  function,  given  by 


F(x)  =  /'rr-1e-TciT 
“o 

The  LO  nonlinearity  associated  with  /c  is 

9c  =  cyc  |x  jc_1  sgn(x)  (2.24) 

This  family  of  densities  includes  the  Gaussian  density  for  c  =  2  and  the 
Laplace  density  for  c  =  1.  It  has  received  attention  in  previous  work,  both 
as  a  convenient  heavy-tailed  density  for  theoretical  analyses  [17,37-39], 
as  well  as  a  reasonable  model  for  observed  noise  densities  [21].  This  fam¬ 
ily  has  also  been  used  to  describe  lighter  tailed  densities  than  the  Gaus¬ 
sian  [36],  with  values  of  c  «3.  As  c  -*00,  the  density  tends  toward  a  uni¬ 
form  distribution. 

Figure  2.16  compares  some  members  of  the  generalized  Gaussian 
density  family  on  a  linear  scale,  and  in  Fig.  2.17  they  are  compared  on  a 
logarithmic  scale.  Some  samples  of  the  LO  nonlinearity  may  be  found  in 
Fig.  2. 18. 


Johnson  S„  Family  —  Transformed  Gaussian  Density 

Another  family  of  heavy  tailed  pdf’s  is  the  Johnson  Su  family.  It  has 
been  proposed  [40]  that  certain  heavy  tailed  non-Gaussian  densities  may 
be  thought  of  as  arising  from  nonlinear  distortions  of  the  Gaussian  den¬ 
sity.  For  example,  if  Y  is  distributed  as  a  zero  mean,  unit  variance 
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Fig.  2.16.  Representative  generalized  Gaussian  densities  fc 
for  various  values  of  parameter  c  . 
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Fig.  2.17.  Logarithm  of  generalized  Gaussian  densities  fc  . 
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Fig.  2.18.  Locally  optimum  nonlinearity  gc  for  generalized 
Gaussian  densities. 


-34- 


Gaussian  random  variable,  and  a  new  random  variable  is  defined  as 

X  =  u  sinh(  Y/ <5)  (2.25) 

then  the  density  of  X  has  unit  variance,  and  belongs  to  the  Johnson  Su 
family,  given  by 


f  six) 


6 

'xf_ 

uV27t 

u2 

e-(<5/2)sinh  l{x/u)z 


(2.26) 


with 


u 


2  a2 


e  (2/  i2)  _  i 


% 


(2.27) 


The  parameter  (5  controls  the  tail  heaviness.  As  <5-*°°,  the  pdf  tails 
become  progressively  lighter,  and  approach  Gaussian  tails  in  the  limit. 
Like  the  generalized  Gaussian  family,  a  single  parameter  indexes  the 
range  of  tail  behaviors. 


The  LO  nonlinearity  associated  with  f  $  may  be  written  as 


9 6^)  -  ~ 


u 


1  +  - 


u 


-1 


1+ 
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-a 


— sinh-1  — 
u  u 


(2.28) 


Some  representative  members  of  the  Johnson  Su  family  are  shown  in 
Fig.  2.19  on  a  linear  scale,  and  in  Fig.  2.20  on  a  logarithmic  scale  The 
corresponding  LO  nonlinearities  g$  are  given  in  Figure  2.21. 


3.  Detectors  and  the  N  on-Gaus  si  an  Noise  Environment 

Up  to  this  point,  basic  detector  structures  have  been  reviewed,  and 
some  simple  noise  density  models  have  been  presented.  We  now  consider 
some  effects  of  a  non-Gaussian  noise  environment  upon  the  detection 
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Pig.  2.19.  Representative  Johnson  Su  densities  /  $  compared 
to  the  Gaussian  density. 
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Pig.  2.20  Logarithm  of  Johnson  Su  densities  /  $  compared  to 
log  of  Gaussian  density. 
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Fig.  2.21.  Locally  optimum  nonlinearity  g $  for  Johsnson  Su 
densities  compared  to  unity  slope  linear  detector  Id. 
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problem. 

If  two  noise  processes  with  equal  variances  are  compared,  one  Gaus¬ 
sian  and  the  other  non-Gaussian  in  the  sense  previously  discussed,  it  will 
become  apparent  that  the  non-Gaussian  noise  has  many  more  large 
valued  observations,  or  a  larger  degree  of  scatter.  In  the  estimation  or 
regression  contexts,  one  might  say  that  the  non-Gaussian  noise  process 
observations  contain  a  larger  number  of  outliers. 

Relation  between  Non-Gaussian  Estimation  and  Detection 

Work  in  robust  estimation  has  long  suggested  that,  in  heavy  tailed 
noise,  a  robust  estimator  of  the  mean  should  reduce  the  influence  of 
very  large  data  observations  while  leaving  observations  near  the  mean 
relatively  unchanged  [41].  Any  estimator  uses  a  finite  number  of  obser¬ 
vations,  and  an  excessive  number  of  outliers  unduly  affects  the  estimate, 
generally  increasing  its  variance  with  respect  to  a  robust  estimator.  Note 
that  "excessive",  as  used  here,  is  a  qualitative  term,  with  a  meaning 
dependent  upon  the  particular  estimator.  Estimators  based  upon  Gaus¬ 
sian  noise  statistics  typically  have  little  protection  against  outliers,  for 
the  simple  reason  that  the  effect  of  large  observations  is  undiminished  in 
any  way;  even  the  addition  of  a  single  observation  with  very  large  magni¬ 
tude  relative  to  the  rest  of  the  observations  may  significantly  distort  the 
outcome. 

The  effect  of  an  outlier  on  an  estimator  can  be  measured  through  the 
calculation  of  a  sensitivity  curve.  Andrews,  et  a l.  [31],  present 
numerous  examples  of  the  sensitivity  curves  for  some  common  estima¬ 
tors.  In  estimation  of  the  mean,  it  turns  out  that  the  optimal  estimator 
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has  a  sensitivity  curve  given  by  ^(x)  =  -^-ln /  (x).  This  expression  is 

identical  to  (2.7),  the  formula  for  giQ.  Considering  the  duality  between 
estimation  and  detection,  this  is  not  a  surprising  result.  Further,  the 
Cramer-Rao  inequality  [l  p.  127]  evaluates  the  efficiency  of  an  estimator 
g  via  an  expression  identical  to  the  efficacy  of  detector  nonlinearity  g 

Since  the  test  statistic  in  a  detector  also  uses  only  a  finite  number  of 
observations,  the  detector  nonlinearity  must  reduce  the  impact  of 
outliers.  NP  and  LO  detector  nonlinearities  related  to  non-Gaussian 
heavy  tailed  densities  are  typically  composed  of  a  linear  region  sur¬ 
rounded  by  tails  which  compress,  limit,  or  even  blank  large  data  observa¬ 
tions.  The  previous  examples  of  LO  nonlinearities  for  some  common 
heavy-tailed  noise  models  exhibit  this  type  of  behavior. 

Non-Gaussian  Density  Characterization 

Given  previously,  and  repeated  here,  is  the  loose  definition  of  the 
non-Gaussian  noise  densities  which  will  be  of  interest  in  the  following 
chapters:  the  noise  pdf  is  unimodal,  symmetric  about  its  mean  placed  at 
the  origin,  and  has  nonzero  support  over  the  entire  real  line.  Near  the 
mean,  the  density  has  a  Gaussian-like  shape,  and  the  tails  asymptotically 
decrease  to  zero,  but  at  a  slower  rate  than  the  Gaussian;  i.e., 

lim  el2/2ff2/ (z)  =t»  where  a2,  is  the  noise  variance.  Note  that  both 

1*1  — 

Middleton’s  Class  A  density  and  the  Gaussian-Gaussian  e-mixture  satisfy 
this  definition,  despite  being  the  sums  of  various  Gaussian  densities. 

The  following  characteristics  loosely  specify  the  LO  detector  non- 
linearities  related  to  the  desired  types  of  non-Gaussian  densities: 


-39- 


(а)  continuous,  with  continuous  low-order  derivatives 

(б )  approximately  linear  at  the  origin 

(c )  odd  symmetric  about  the  origin 

(d)  strictly  positive  to  the  right  of  the  origin 

(e )  monotone  in  the  tail  regions 

Note  that,  in  light  of  (2.7),  specification  of  the  LO  nonlinearity  behavior  is 
equivalent  to  specifying  the  form  of  the  associated  density. 

Motivation  for  Nearly  Optimal  Detection 

Implicit  in  both  the  NP  and  LO  detection  methods  is  a  requirement 
that  the  noise  pdf  must  be  known  exactly.  This  knowledge  is  needed  in 
order  to  construct  g^p  or  gi0  In  general,  the  noise  statistics  are  not 
known  with  precision  and  the  design  of  the  LO  or  NP  detector  is  not 
straightforward.  An  additional  consideration  is  that  often  the  noise 
environment  is  nonstationary,  and  an  adaptive  structure  is  necessary. 

Alternative  detection  strategies  are  available,  and  among  these  are 
(1)  detectors  which  are  robust  with  respect  to  deviations  from  a  nominal 
noise  environment  [34,37,42,43];  (2)  nonparametric  detectors  which  use 
only  very  general  information  about  the  underlying  noise  distribution 
[44,45];  and  (3)  fixed  suboptimal  detectors  with  acceptable  performance 
[27,46-48].  There  are  some  problems  with  each  of  the  three  strategies: 
first,  it  is  not  clear  in  the  design  of  minimax  robust  detectors  what  den¬ 
sity  should  be  chosen  as  the  nominal  environment  and  what  class  of  den¬ 
sities  should  be  chosen  as  unfavorable  alternatives.  Also,  solution  of  the 
problem  may  be  quite  difficult.  Second,  while  nonparametric  methods 
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are  usually  simple  and  afford  some  degree  of  protection  against  heavy 
tailed  noises,  they  may  not  be  as  efficient  as  possible.  Third,  a  fixed 
suboptimal  detector  may  be  simple  to  implement,  but  may  suffer  severe 
performance  degradation  should  the  noise  environment  change  from 
nominal  conditions. 

With  these  ideas  about  the  non-Gaussian  noise  environment  in  mind, 
the  following  chapters  explore  methods  related  to  nearly  optimal,  yet 
simple,  detector  nonlinearities.  The  previous  discussion  should  make 
clear  the  necessity  for  simple  methods  which  can  adapt  the  detector 
structure  to  unknown,  and  possibly  changing,  noise  environments. 


-41- 


Appendix  2. 1 

Throughout  the  thesis,  reference  will  be  made  to  some  Arctic  under¬ 
ice  noise  data..  This  data  is  the  digitized  output  of  a  hydrophone 
suspended  beneath  an  ocean  ice  surface.  Details  of  the  data  collection 
and  analysis  of  the  data  is  provided  by  Dwyer  [22], 

The  data,  covering  a  time  span  of  approximately  10  minutes,  consists 
of  6006  records  of  1024  data  points,  sampled  at  a  10  kHz  rate.  As  Dwyer 
points  out,  the  data  taken  as  a  whole  appears  to  be  nonstationary  and 
non-Gaussian:  upon  further  examination,  however,  it  appears  that  only 
certain  of  the  noise  records  deviate  significantly  from  a  nominal  Gaussian 
distribution.  The  following  argument  is  raised: 

Define  the  estimated  mean  of  data  record  k  as 

I  1024 

Ml( k)  =  ToU" 

where  nk  i  is  the  i01  sample  of  record  k.  The  rth  central  moment  of  data 
record  k  may  be  computed  for  r  >  1  as 

1  1024 

=  1024-  .£ 

Using  the  second,  third  and  fourth  central  moments,  the  skeumess  /Sj  and 
the  kurtosis  /32  of  a  sample  distribution  may  be  computed  as 


01  = 


M3 


,,3/2 

M  2 


_  M  4 

02-  7F 
M  2 


Then  for  each  record  k,  the  sample  mean  yUj,  sample  variance  a2,  sample 
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skewness  0lt  and  sample  kurtosis  /J2  may  be  plotted  as  a  function  of  h. 
Examination  of  the  plots,  and  of  the  kurtosis  plot  in  particular,  reveals 
that  a  proportion  of  the  sample  records  deviate  from  nominal  values.  The 
nominal  kurtosis  is  approximately  3,  the  exact  value  of  kurtosis  for  the 
Gaussian  density.  Occasionally,  values  /S2»3  are  observed.  It  is  these 
records  which  are  of  interest  in  the  thesis,  for  a  high  kurtosis  value  for  a 
unimodal  density  indicates  a  heavy-tailed  density.  For  example,  the  kur¬ 
tosis  of  the  Laplace  density  is  6.  The  sample  cdf  of  the  data  indicates 
that  the  density  is  unimodal;  therefore  the  conclusion  is  that  the  records 
with  a  high  sample  kurtosis  have  a  heavy  tailed  non-Gaussian  density. 

The  data  from  records  with  kurtosis  exceeding  4  was  collected  for 
use  in  simulation  in  the  following  chapters.  Of  the  6006  records,  58  met 
the  selection  criterion.  Figure  A2. 1  presents  a  list,  indexed  by  BLOCK,  of 
their  record  numbers  (RECORD),  sample  means  (MEAN),  variances  (VARI¬ 
ANCE),  skewness  (BETAl)  and  kurtosis  (BETA2).  Figs.  A2.2  -  A2.5  present 
this  data  in  graphical  form.  Data  samples  from  a  "typical"  high  kurtosis 
block  is  presented  in  Fig.  A2.6.  Note  that  the  deviation  from  a  nominal 
Gaussian  density  is  apparently  confined  to  two  regions  where  the  data  has 
a  much  greater  spread  than  the  majority  of  data  in  the  block. 


43 


BLOCK 

F&CCSD 

AlkU 

VARIANCE 

BETA1 

3  2TA2 

1 

4  1 

0.  1630 

0.2243 

-0.6789 

17.6774 

2 

48 

0.1553 

0. 1235 

0. 0230 

5.71  60 

3 

57 

0.1629 

0.1742 

-0.2864 

5.5454 

4 

68 

0.  1698 

0.1955 

0.  0075 

4.2291 

5 

69 

0.1566 

0.2046 

0. 2204 

4.  2246 

6 

*19 

0.  1664 

0.2026 

-0.  18  19 

4.3539 

7 

708 

0.  1657 

0.  19  37 

-0.2008 

4.1753 

8 

724 

0.1649 

0.2673 

-0.  38  40 

4.  19  81 

9 

726 

0.  1630 

0.2570 

-0.3672 

4.1651 

10 

73  0 

0.1536 

0. 3806 

-0.  8361 

4.  0926 

11 

732 

0.  1641 

0.3958 

-0.6972 

4.  1245 

12 

79  1 

0. 1684 

0.  1871 

-0.3765 

4.06  2  7 

13 

867 

0.1708 

0.2621 

-0.4333 

4.  1756 

1  4 

1261 

C.  1678 

0.221  0 

-0.6956 

12.4215 

15 

1349 

0.  1622 

0. 1303 

0. 2202 

4.0059 

16 

1362 

0.  1549 

0. 0918 

-3.  1228 

28.1921 

17 

1363 

0 . 1 6  94 

0.0629 

0.  1948 

5.0415 

18 

1377 

G. 1606 

0.1097 

-0.0032 

9.8037 

19 

1380 

0.  1703 

0.0980 

0.  0397 

4.2458 

2J 

1384 

0. 1b48 

0.0968 

-0.2784 

4.7318 

21 

1388 

0.  1692 

0.  06  56 

-0.1791 

5.81  06 

22 

1474 

0.1701 

0.0929 

0. 2060 

4.  1320 

23 

1464 

C.1664 

0.064  7 

0.2371 

4.0226 

24 

1487 

0.  1583 

0.  1700 

0. 5149 

24.1614 

25 

1544 

0.1648 

0.1096 

0.0005 

5.  9730 

26 

1  878 

0.  1  6  7  8 

0. 1431 

-0.0360 

5.6124 

27 

1  888 

0.1671 

0.  1056 

0.  2052 

7,  7451 

28 

1  916 

0.  1o  95 

0.0594 

-0. 1035 

4.  3778 

29 

19  18 

0. 1730 

0. 0842 

-0.C928 

4.3102 

30 

1924 

0.1644 

0.0675 

-0.  26  26 

4.  1122 

31 

193e 

0. 1 554 

0.0962 

-0.55  38 

7.  1052 

32 

1943 

0. 1592 

0.0565 

0. 5076 

5.5115 

33 

1946 

C.  1658 

0.0630 

-0.6095 

5.  1463 

24 

1962 

C.  1500 

0 .0656 

0.1 034 

5.1285 

35 

2  04  1 

0.1596 

0.  1984 

-0.  8858 

11.2526 

36 

2042 

0,1717 

0.2063 

0.6 156 

-\  4  30U 

37 

2066 

0.  1709 

0. 1523 

-1.56% 

29.5222 

38 

2107 

0.1671 

0.1113 

0.0973 

5.  2402 

39 

2114 

0.1636 

0.G939 

0.3051 

4.  1472 

40 

2132 

0.  1697 

0.1412 

0. 2270 

5.0950 

4  1 

2177 

0.  1659 

0.1000 

-1.0546 

13.  4  157 

42 

2220 

0.  1422 

0.2481 

-2.72  3  0 

22.9806 

43 

2226 

0.1672 

0. 1857 

-0.  2045 

7.  0620 

44 

2229 

0.1603 

0.1425 

0.1391 

12.5198 

45 

2230 

0.  1737 

0.1489 

0.2172 

6.1677 

46 

2233 

0.1648 

0.  1043 

0.  1070 

4.  1405 

47 

2236 

0.  1592 

0.1 363 

-1 .3927 

14.5487 

46 

2238 

0.  1672 

0. 1460 

0.  3935 

5.  7368 

49 

2239 

0.1651 

0.1770 

0.  30  23 

1  1.  6600 

50 

2240 

0.  1640 

0.1321 

-0.1224 

6.1600 

51 

2242 

0. 17  28 

0.  1401 

0. 2159 

5.0946 

52 

2246 

0. 1645 

0.2141 

1.0191 

10. 2487 

53 

2247 

0. 1672 

0.  1828 

0.02  07 

11.3074 

54 

2248 

0.1133 

0.3774 

-2.  9252 

10.  5053 

55 

2249 

0.169C 

0.1384 

0.1168 

6.0615 

56 

2250 

0. 1690 

0.  1677 

0. 1659 

4.  8567 

57 

2254 

0.1642 

0.1657 

0.  1431 

8.  1808 

58 

226  1 

Pig.  A2.1. 

0.  17  13  0.0788  -0.2939  4.04  7  6 

Table  of  data  record  sample  moments  for  records 

•with  kurtosis  exceeding  4.  BLOCK  is  the  index  of  the  selected 
records.  RECORD  indexes  the  6006  data  records  of  1024  sam¬ 
ples. 
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Fig.  A2.2.  Sample  mean  /Uj  for  the  data  records  with  kurtosis 
exceeding  4.  Data  records  indexed  by  BLOCK. 


Fig.  A2.3.  Sample  variance  a 2  for  the  data  records  with  kur¬ 
tosis  exceeding  4.  Data  records  indexed  by  BLOCK. 
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Pig.  A2A  Sample  skewness  for  the  data  records  with  kur- 
tosis  exceeding  4  Data  records  indexed  by  BLOCK. 
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Fig.  A2.5.  Sample  kurtosis  jS2  l°r  the  data  records  with  kur- 
tosis  exceeding  4.  Data  records  indexed  by  BLOCK. 
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Fig.  A2.6.  Sample  data  from  data  record  2220.  The  sample 
moments  are  ^  =  .1422;  a2  =  .2481;  £,  =  -2.723;  and 
£2  =  22.98.  RECORD=2220,  and  BL0CK=42. 
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Adaptive  Optimization  of 
Suboptimal  Nonlinearities 


This  chapter  investigates  the  feasibility  of  two  simple  alternatives  to 
locally  optimal  detector  nonlmearities.  Provided  some  simple  measure¬ 
ments  on  the  noise  density  are  available,  it  is  demonstrated  that  it  is  pos¬ 
sible  to  construct  nonlinearities  which  produce  near-optimal  levels  of 
performance  in  several  specific  noise  environments.  Section  1  presents 
an  overview  of  some  practical  issues  which  motivate  the  necessity  for 
near-optimal,  yet  uncomplicated,  detector  nonlinearities. 

The  basic  philosophy  forwarded  is  that  nonlinearities  designed  for 
practical  detectors  should  have  an  uncomplicated  structure  that  may  be 
easily  adapted  to  changing  noise  situations.  Two  main  issues  are 
addressed:  the  first  is  development  of  algorithms  to  determine  the  gross 
shape  (input-output  relationship)  of  the  nonlinearity.  Of  primary  impor¬ 
tance  is  the  tail  behavior  of  the  nonlinearity,  for  it  will  determine  the 
degree  to  which  impulsively  contaminated  observations  can  influence  the 
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detector  test  statistic.  Sections  2  and  3  discuss  two  alternatives  to 
optimal  nonlinearities.  The  second  issue  resolved  is  the  matter  of  scaling 
the  input  to  the  nonlinearity.  This  problem  is  essentially  equivalent  to 
determining  the  noise  variance  and  scaling  the  input.  However,  the  usual 
estimators  of  variance  depending  upon  the  squares  of  the  noise  observa¬ 
tions  are  inefficient  when  the  noise  has  a  heavy  tailed  density.  An  alter¬ 
native  scaling  method  is  developed  near  the  ends  of  Sections  2  and  3. 

Sections  4  and  5  provide  a  numerical  comparison  of  the  suboptimal 
nonlinearities  for  cases  where  the  true  noise  density  is  known.  Also,  the 
algorithms  are  simulated  using  observed  noise  data.  Section  6  provides  a 
review  of  the  techniques  and  results  presented  in  this  chapter. 

1.  Introduction 

In  principle,  the  design  of  a  Neyman-Pearson  (NP)  or  locally  optimal 
(LO)  detector  nonlinearity  for  a  signal  in  additive  white  noise  is  a  simple 
matter  when  the  noise  statistics  are  known  exactly.  There  are,  unfor¬ 
tunately,  some  practical  problems  related  to  the  implementation  of  a 
nonlinearity.  The  most  significant  problem  is  simply  that  the  true  noise 
statistics  are  usually  unknown,  or  changing  in  time.  While  well  known 
techniques  exist  for  obtaining  the  noise  density  [1-3],  they  often  require  a 
fairly  large  observation  period  to  achieve  an  acceptably  smooth  estimate 
For  example,  Wilson  and  Powell  [4]  present  kernel  function  type  density 
estimates  of  several  observed  noises  The  estimates  are  noisy  and  rough 
looking  when  the  logarithm  of  the  densities  are  plotted.  A  LO  nonlinearity 
could  be  estimated  from  the  derivative  of  the  log  of  the  densities,  but 
this  would  further  emphasize  the  roughness.  Additional  smoothing  of  the 
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density  would  be  required  if  an  acceptable  nonlinearity  is  to  be  obtained, 
and  even  then  the  nonlinearity  may  be  still  somewhat  noisy  or  rough 
looking  {e  g  [5]). 

Another  problem  is  that,  even  when  the  density  is  known  or  can  be 
estimated  smoothly,  the  related  memoryless  nonlinearity  (ZNL)  itself  is 
sometimes  complicated  enough  to  make  implementation  or  adaptation 
relatively  difficult  For  example,  the  Middleton  Class  A  and  Class  B  noise 
models  have  been  proposed  as  physically  based  canonical  representations 
for  non-Gaussian  noise,  with  parameters  that  may  be  calculated  directly 
from  physical  considerations.  Both  models  are  infinite  series  [6,7];  the 
Class  A  series  comprises  weighted  Gaussian  density  terms,  while  the  Class 
B  series  comprises  confluent  hypergeometric  functions,  which  them¬ 
selves  are  defined  generally  via  an  infinite  series  [8,  p.504].  The  detector 
nonlinearities  associated  with  these  models  may  be  calculated  directly, 
but  at  the  expense  of  a  high  computational  burden.  Adaptation  of  the 
nonlinearities  incurs  a  similar  computational  cost 

One  approach  toward  overcoming  these  difficulties  with  the  optimal 
nonlinearity  is  to  use  a  suboptimal  ZNL  that  has  nearly  optimal  perfor¬ 
mance,  but  has  a  structure  that  is  simple  to  implement  and  easily  adapt¬ 
able.  Some  recent  examples  of  this  approach  include  the  work  by  Miller 
and  Thomas  [9],  Modestino  [10],  Ingram  and  Houle  [11],  Ziemer  and  Flu- 
chel  [12],  and  Vastola  [18,19], 

This  chapter  presents  an  approach  to  the  design  of  a  noise-adaptive 
suboptimal  detectors  with  these  ideas  in  mind,  focusing  on  the  locally 
optimal  detection  problem  and  noise  environment  of  Chapter  2. 
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2.  Approximation  via  Noise  Tail  Matching 

The  previous  chapter  presented  a  discussion  of  a  particular  type  of 
non-Gaussian  noise  environment  where  the  mode  of  the  noise  pdf 
appeared  as  Gaussian-like,  but  the  density  tails  were  much  heavier  than 
the  Gaussian  As  noted,  the  LO  nonlinearities  associated  with  these  types 
of  densities  have  a  nearly  linear  processing  characteristic  for  input 
values  near  the  pdf  mode  On  the  other  hand,  the  tail  behavior  of  the  LO 
nonlinearities  ranges  from  linear  for  a  noise  pdf  with  Gaussian  tails,  to  a 
limiter  for  exponentially  decreasing  pdf  tails,  to  a  blanker  for  algebrai¬ 
cally  decreasing  pdf  tails.  In  general,  the  heavier-tailed  the  noise  density 
is  relative  to  the  Gaussian  pdf,  the  more  severely  curtailed  is  the  effect  of 
large  data  observations. 

One  objective  of  a  noise  adaptive  nonlinearity,  then,  should  be  to 
relate  the  ZNL  tail  behavior  to  the  actually  observed  noise  pdf  tail 
behavior.  The  main  idea  of  the  algorithm  in  this  section  is  to  establish  a 
relation  between  a  measure  of  tail  heaviness  and  a  member  of  a  con¬ 
venient  class  of  heavy  tailed  densities.  Rather  than  performing  a 
parametric  fitting  within  the  density  class,  the  algorithm  chooses  a  den¬ 
sity  whose  tail  characteristics  have  the  same  tail  heaviness  measure  as 
the  actual  noise  density  The  nonlinearity  tails  are  thus  determined  by 
the  member  of  the  density  class  chosen.  The  central  region  of  the  non¬ 
linearity  joins  the  two  tails  with  some  function  which  gives  a  desirable 
near-linear  processing. 
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Tail  Selection  Procedure 


This  idea  is  clarified  and  illustrated  by  proposing  the  following:  It  has 
been  reported  by  Walt  and  Maxwell  [21]  that  the  generalized  Gaussian 
density 

f  (x)  =  _ _ 2~\7x\c  (3  n 

}cK  ]  zru/c) 


in  certain  instances  can  describe  the  pdf  tails  of  physical  noise  sources. 
For  a  noise  variance  of  a2,  the  parameter  7  is  defined  by 


TO/c) 

/c) 


JS 


The  corresponding  LO  nonlinearities,  shown  earlier  in  Fig.  2.18,  may  be 
written  as 


9lo(x)  -  C7c  I*  lc  !sgn(x)  (3.2) 

with  c  conveniently  parameterizing  ZNL  tail  behavior.  Therefore,  we 
model  the  observed  noise  pdf  tails  via  the  generalized  Gaussian  family.  If 
these  density  tails  are  used  to  generate  a  suboptimal  LO  nonlinearity,  it 
will  have  power  law  tails  described  by 

9tm(x)  -  cy6 \x  |c_1sgn(x)  for|x|>x0  (3.3) 

It  is  necessary  to  find  a  value  c  such  that  /e  is  a  good  approximation  to 
the  tail  behavior  of  the  true,  but  unknown,  underlying  noise  density.  A 
simple  way  to  do  this  is  to  equate  the  tail  probability  of  /-  with  the 
observed  tail  mass 

W!n1) 

iV  i  =  1 


(3.4) 
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Here,  /  is  the  indicator  function  and  t ±  are  the  noise  observations 
presumed  available  from  a  noise  reference  channel.  The  exponent  c  may¬ 
be  estimated  as  the  value  giving 

ao 

ZjTfc(x)dx  =  PT  (3.5) 

where,  for  convenience,  it  is  assumed  that  the  noise  has  zero  mean  and 
unity  variance.  The  estimate  c  is  defined  implicitly  by  the  integral  in 
(3.5);  therefore  it  is  desirable  to  derive  a  simpler  explicit  relationship 

c  =  hT(PT)  (3.6) 

One  obvious  method  for  obtaining  (3.6)  is  to  first  calculate  PT  as  a  func¬ 
tion  of  c ,  and  then  use  interpolation  to  find  the  inverse  relation  hT.  This 
tabulated  version  of  hT  is  shown  in  Fig  3. 1. 

With  cr2  fixed  and  c  small,  the  value  of  7,  a  scale  factor,  becomes 
large.  Even  though  fc(x )  approaches  zero  asymptotically  at  a  much 
slower  rate  than  the  Gaussian  pdf  as  \x  |  becomes  large,  the  total  proba¬ 
bility  mass  in  the  tails  is  quite  small.  As  a  result,  hT  is  multiple  valued, 
the  density  is  extremely  peaked,  and  the  LO  nonlinearity  has  a  discon¬ 
tinuity  at  the  origin  For  c  <  1,  the  requirement  that  the  subop timal  ZNL 
be  nearly  linear  at  the  origin  clearly  is  not  met  by  (3.2). 

The  objective  in  using  the  generalized  Gaussian  pdf  is  to  relate  the 
tail  heaviness  of  an  observed  noise  to  a  parameter  governing  the  shape  of 
the  ZNL  tail.  Therefore  we  replace  the  anomalous  behavior  of  the  true 
function  h,  with  a  simple  linear  relation 

hj f)  —  .Py’+fcg 


(3.7) 


Fig.  3.1.  The  exact  and  approximate  relationships  between 
exponent  c  and  tail  probability  Pf  for  unit  variance  general¬ 
ized  gaussian  density  for  various  thresholds  T.  The  exact 
relations  hT  are  the  solid  curved  lines,  and  the  linear 
approximations  h?  are  the  broken  straight  lines 
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where  ky  and  kz  are  chosen  to  approximate  (3.6)  for  a  particular  value  of 
T  Several  sample  approximations  are  plotted  as  the  broken  lines  on  Fig. 
3.1.  The  values  for  k  j  and  kz  are  chosen  so  that  when  PT  corresponds  to 
Gaussian  or  exponential  noise  tails,  (3.7)  gives  c  =  2  and  c  =  1,  respec¬ 
tively.  Note  that  the  linear  relation  allows  the  value  of  c  to  be  negative 
for  large  tail  probabilities. 

The  tail  measurement  threshold  T  must  be  chosen  prior  to  estimat¬ 
ing  parameters  k  ]  and  kz.  One  way  to  pick  T  is  to  choose  a  value  solving 

irdnEc[varc|  =  nun  ^-Ec  \PT{l-PT)\  (3.8) 

for  some  prior  density  on  the  parameter  c ,  where  N  is  the  number  of 
noise  observations.  For  c  uniformly  distributed  on  the  interval  [1,2],  the 
value  T  =  3o  approximately  minimizes  (3.8).  In  practice,  some  better 
knowledge  of  the  distribution  of  c  should  develop,  and  T  may  be  adjusted 
to  minimize  (3.8) 

Central  Region  Selection 

The  LO  nonlinearities  of  the  generalized  Gaussian  family  have  desir¬ 
able  tail  behavior,  but  for  small  values  of  c  the  behavior  does  not  meet 
the  constraint  of  linearity  near  the  origin.  To  eliminate  this  behavior,  the 
ZNL  needs  modification  in  the  region  near  the  origin.  A  way  in  which  to 
do  this  is  to  replace  ^tm(x),  for  x  near  zero,  with  a  function  that  will 
smoothly  connect  the  two  tails  and  have  linear-like  behavior  near  the  ori¬ 
gin  A  suitable  family  of  functions  are  polynomials  p  (x)  with  the  following 
characteristics: 
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p(x)  =  a3i3+a2iz+aji+a0  for  0<x  <x0 

P(0)  =  0 

P  ( I  ±*0 1  )sgn(±x0)  =  gtm  (±x0) 
p'(|±X0|)  =  g'tm(±Xo) 

P"(  I  ±XqI  )sgn(±x0)  =ff"tm(±x0 ) 

Also,  because  p(x )  is  a  third  order  polynomial,  p'(x)  «  a:  for  |x  |  very 
near  zero.  This  implies  that  p  will  be  nearly  linear  in  a  neighborhood 
about  the  origin,  for  there  its  slope  is  approximately  independent  of  x 


Seeding 


The  choice  of  tail  behavior  via  c  and  the  point  x0  completely  specify 
p(x).  The  method  for  choosing  c  has  already  been  specified,  leaving  x0  as 
the  sole  free  parameter.  A  method  equivalent  to  choosing  the  proper  x0 
is  to  choose  an  arbitrary  x0  and  scale  the  input  to  the  ZNL  with  a  factor  v. 
It  is  reasonable  to  choose  v  to  maximize  the  efficacy  of  the  ZNL.  For  an 
arbitrary  nonlinearity  g ,  efficacy  as  a  function  of  v  may  be  rewritten  as 


_  i^E|[p'(i/x)] 

E/[p2(i/x)]-E|[p(ux)] 


(3.9) 


In  principle,  (3.9)  can  be  solved  exactly.  Unfortunately,  a  closed  form 
solution  for  u  cannot  be  found  in  general,  and  the  density  /  is  generally 
unknown.  These  problems  may  be  circumvented  by  approximating  the 
expectations  with  integrations  over  the  noise  empirical  distribution,  and 
solving  (3.9)  via  stochastic  approximation  methods. 

At  this  point,  specification  of  the  suboptimal  nonlinearity  gtm  is  com¬ 
plete,  and  may  be  written  as 


P  ( I vx  I )  sgn(vx)  if  1  ux  —  xo 
c  |  vx  |c_1  sgn(ux)  if  ui\  >Xq 


9tm (vx)  = 


(3.10) 
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Figures  3.2  to  3.4  give  some  examples  of  the  types  of  nonlinearities  avail¬ 
able  using  this  approximation  method 


3.  Optimization  via  Efficacy  Maximization 

The  previous  section  developed  a  method  for  choosing  suboptimal 
nonlinearities  in  what  is  primarily  an  "open-loop"  fashion:  A  relation  was 
established  a  priori  between  observed  tail  heaviness  and  tail  heaviness  of 
known  noise  densities.  The  tails  of  the  known  densities  were  then  used  to 
generate  tails  for  the  suboptimal  ZNL.  It  is  not  obvious  that  this  method 
is  optimal  in  any  sense,  save  for  its  sheer  simplicity. 

Another  approach  is  to  choose  a  class  of  nonlinearities  of  desirable 
shape  and  convenient  parameterization,  and  then  find  the  member  of  the 
class  which  maximizes  performance.  This  type  of  approach  may  be  con¬ 
sidered  to  be  a  "closed-loop"  technique,  for  measurements  on  the 
observed  noise  density  lead  to  selection  of  the  optimal  member  of  the 
nonlinearity  class;  the  performance  measure  provides  "feedback"  to  the 
selection  algorithm. 

Again,  under  the  detection  situation  and  noise  environment 
described  in  Chapter  2,  the  following  suboptimal  ZNL  is  proposed, 
comprising  a  central  linear  region  and  two  linear  tail  regions: 


02i(*)  =  (sgnz) 


\x  |  for  \x  \  <  a 

6|x|+a(l— 6)  for  a  <  \x\<xT 


(3.11) 


where 


a(b.  ^  for  b  <  0 
o 


X  — 


oo 


for  6^0 


(3.12) 
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Fig.  3.2.  Representative  nonlmearities  gtm  for  x0  =  3,  u-  1, 
and  various  c . 


Fig.  3.3.  Representative  nonlinearities  gtm  for  x0  =  3,  various 
v,  and  c  =  .5. 
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Fig.  3.4.  Representative  nonlinearities  gtm  for  x0  =  3,  various 
u,  and  c  =  1 
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The  parameter  a  governs  the  breakpoint  between  the  central  and  tail 
regions,  and  b  governs  the  slope  of  the  tail  region  segments.  A  pair  of 
representative  examples  of  g&ix)  are  given  in  Figs.  3.5  and  3.6. 


Procedure  for  Estimating  Tail  Slope 

Initially,  we  will  assume  a  is  fixed  and  turn  attention  to  the  problem 
of  estimating  6 .  Some  comments  will  be  made  in  the  following  subsection 
on  the  issue  of  finding  the  breakpoint. 

The  usual  performance  measure  for  a  LO  detector  g  is  efficacy,  dis¬ 
cussed  in  Chapter  2  and  recalled  here  as 


V/(g)  = 


2 


fg'(x)f  (x)dx 


fg2(x)f  {x)dx 


(3.13) 


where  the  underlying  noise  density  is  /  and  g  has  zero  mean  under  /  . 


Figs.  3.5  and  3.6  highlight  the  fact  that  there  are  two  distinct  possi¬ 
bilities  for  the  shape  of  p2i-  In  Fig  3-5,  the  slope  parameter  6  is  greater 
than  zero,  and  g#.^  is  nonzero  over  the  entire  tail  region.  In  Fig.  3.6,  the 
parameter  6  is  less  than  zero,  and  g^.b-  is  nonzero  only  over  a  finite 
interval. 

Using  (3.13),  the  efficacy  of  g^*  may  be  written  as 


riba**)  = 


ff  *  off 


Zfx2/  +  2/(6i+a(l-6))2/ 


(3.14) 
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ff2l;b+  0 

-2 

-4 


Fig.  3.5.  A  representative  example  of  gzib  +  for  a  =  2  and 
6  =  .25. 
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Fig.  3.6.  A  representative  example  of  9zi  b-  f°r  a  =  2  and 
b  =  -.5. 
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Here,  vidzi-t*)  *s  3:1  explicit  function  of  6,  so  it  may  be  maximized  by 
finding  6*  such  that  =0  In  Appendix  3.1,  an  explicit  solution 

56  b=b» 

(A3. 6)  for  b*  is  found  to  be  a  rational  function  of  the  partial  moments  of 
/ ,  which  depend  implicitly  on  the  choice  of  a .  It  was  not  possible  to  find 
an  expression  which  yields  explicitly  a  value  a*  that  maximizes  (3.14)  as 
a  function  of  a 

The  explicit  solution  (A3. 6)  was  derived  to  find  the  6  *  that  maximizes 
7j(p21{)+).  When  the  solution  6  *  is  non-negative,  the  tails  of  the  nonlinear¬ 
ity  diverge  from  the  x-axis,  and  g^i  has  support  over  the  entire  real  line. 
Thus,  the  formulation  for  efficacy,  given  in  (3.14)  is  correct,  and  the  solu¬ 
tion  (A3  6)  is  correct. 


What  if  (A3  6)  yields  a  result  b  *  <  0?  The  result  b*  is  still  valid,  but 
the  nonlinearity  for  which  efficacy  is  maximized  is  not  Certain 

integrals  in  (3.5)  have  range  of  integration  (a,°°),  whereas  the  correct 
expression  for  the  efficacy  of  gzi,b-  may  be  written  as 


V(92i,b~)  = 


a  zr 

2 

4 

fl  *  bff 

0  a 

XT 


(3.15) 


2jxzJ  +  zf{bx+a{\-byfj 


with  xT  given  by  (3.12).  Note  that  if  b*<0  and  the  value  xT=-oo  is  used, 
then  what  is  actually  maximized  is  the  efficacy  of  a  nonlinearity  gv  with 
virtual  tails  such  as  those  shown  in  Fig.  3.7. 

It  is  desirable  to  find  an  explicit  solution  for  b  *  which  takes  into 
account  the  fact  that  if  6*<0,  then  the  solution  should  have  been 
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9zi-b- 


-8  -4  0  4  8 


x 


Fig.  a 7.  The  nonlinearity  g j>  is  the  incorrectly  optimized 
9zi,b  —  It  is  a  truncated  version  of  <7y,  whose  virtual  tails  are 
artifacts  of  the  optimization  procedure. 
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generated  by  maximizing  (3.15)  instead  of  maximizing  (3.14).  However, 
the  value  xT  depends  on  b  *,  and  a  closed-form  expression  for  b  *  could 
not  be  developed,  as  was  done  for  (3.14)  where  the  limits  of  integration  do 
not  depend  on  6  *. 

It  is  possible  to  salvage  the  solution  (A3. 6).  First,  note  that  when 
(A3. 6)  is  applied  without  modification  and  gives  6*  <0,  the  value  |  fc  *  |  will 
actually  have  been  underestimated,  for  this  will  reduce  the  mean  square 
error  between  the  virtual  tails  of  gv  and  the  x-axis,  at  the  expense  of 
increasing  the  mean  square  error  between  the  tails  dictated  by  the 
incorrect  b*  and  the  tails  of  the  properly  optimized  in  the  interval 

[a, Sf].  Overall,  this  has  the  effect  of  minimizing  the  performance  degra¬ 
dation  due  to  the  virtual  tails  [15].  A  further  discussion  of  the  mean 
square  error  issue  in  ZNL  approximation  may  be  found  in  Chapter  5. 

Two  options  are  available:  one  is  to  apply  (A3  6),  obtain  b*,  and  if  it 
is  less  than  zero,  calculate  xT  using  (3.12),  and  merely  truncate  the  non¬ 
linearity  at  ±xT.  Appendix  3.2  demonstrates  that  truncating  at  ±xT 
yields  better  performance  than  if  the  virtual  tails  were  ignored  and 
allowed  to  remain.  The  other  option  is  to  apply  (A3. 6)  iteratively.  For 
startup,  (A3.6)  is  applied  directly,  giving  6  *0.  Eqn.  (3. 12)  may  be  used  to 
give  an  initial  value  for  xT,  and  the  integrals  in  (A3. 6)  may  be  modified  to 
have  range  of  integration  (a,xy)  instead  of  (a,«>).  The  appropriately 
modified  (A3. 6)  gives  b*x,  and  the  process  may  continue  in  this  fashion 
until  1 6  *n+1—  6  *n  |  is  less  than  a  predetermined  accuracy 
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Scaling 

The  issue  of  choosing  a*  may  be  approached  by  first  considering  Fig. 
3.8  In  this  example,  the  value  of  77 (g^i)  reaches  a  maximum  at  a«2.5, 
with  the  performance  being  fairly  insensitive  to  the  exact  value  of  a.  In 
fact,  a  50%  change  from  a  =2.5  yields  less  than  a  9%  change  in  efficacy. 
This  suggests  that  a  simple  method  may  be  used  to  find  a  nearly  optimal 
estimate  of  a*.  First,  arbitrarily  choose  three  different  breakpoints 
for  i  —  1,2,3  and  find  the  associated  optimal  tail  slope  6*.  Then  evaluate 
Vi  =  v(9zi)  for  i  =  1,2,3  and  fit  a  parabola  through  the  three  pairs  of 

points  (cij ,77i )  Finally,  choose  a*  as  that  point  which  maximizes  the  value 
of  the  fitted  parabola 

Obviously,  the  initial  choice  of  the  three  breakpoints  cannot  be  com¬ 
pletely  arbitrary.  The  algorithm  will  perform  best  when  the  true  value  of 
a*  is  bracketed  by  the  values  a*,  and  77(^721)  as  a  function  of  the  break¬ 
point  is  approximately  quadratic  for  The  use  of  this  scaling 

procedure  is  demonstrated  in  Section  5. 


4.  Examples  -  Tail  Matching  Algorithm 

We  will  now  present  examples  of  the  use  of  gtm  in  approximating 
some  known  optimal  LO  nonlinearities. 

Generalized  Gaussian  Noise 

The  first  comparison  is  between  the  approximate  and  exact  versions 
of  LO  nonlinearities  for  the  generalized  Gaussian  family.  The  exponent  c 
is  given  by  (3.7)  after  using  the  exact  value  c  in  (3.5)  to  obtain  PT.  Since 
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ng.  aa.  Performance  of  g 21  for  various  breakpoints  a  with 
optimal  tail  slope  b*.  A  Gaussian-Gaussian  e-mixture  density 
for  the  noise  is  assumed  with  e  =  .  1  and  of/  Oq  =  750. 
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this  is  an  analytical  example,  and  the  true  noise  density  is  known,  numer¬ 
ical  methods  can  be  used  to  obtain  v*,  the  value  of  v  which  maximizes 
The  performance  of  the  suboptimal  ZNL  relative  to  the  LO  non¬ 
linearity  may  be  measured  by  asymptotic  relative  efficiency,  given  in 
terms  of  efficacy  as 


ARE 


9tm  <9lo  r)(gL0) 


(3.16) 


Figure  3.9  compares  the  performance  of  gtm,  the  LO  detector  and  a  linear 
detector  {Id),  in  terms  of  ARE^  ^  and  ARE^  ^.  The  suboptimal  non¬ 
linearity  performs  quite  well  for  the  range  l<c  <2,  but  for  c  <  1,  perfor¬ 
mance  deteriorates.  This  is  easily  explained,  since  for  small  c,  the  LO 
nonlinearity  output  approaches  ±°°  for  inputs  near  zero,  while  the 
approximation  method  requires  gtm  to  pass  through  the  origin 


Johnson  Su  Noise 

Another  family  of  heavy  tailed  densities  introduced  in  the  last 
chapter  is  the  Johnson  Su  family.  The  parameter  <5  controls  tail  heavi¬ 
ness,  and  the  density  has  a  Gaussian-like  shape  near  the  mode.  For  the 
purpose  of  example,  it  is  a  convenient  density  family  to  be  used  in  study¬ 
ing  the  properties  of  the  tail  matching  method.  Since  f  &  is  given  and 
known,  Pf  may  be  calculated  from  (3.5),  and  (3.7)  gives  c.  Again,  numeri¬ 
cal  methods  can  be  used  to  find  the  u *  that  maximizes  efficacy.  Some 
representative  LO  nonlinearities  and  suboptimal  approximations  are 
given  for  various  values  of  t5  in  Fig.  3.10,  and  Figure  3.11  presents  the  per¬ 
formance  comparison  of  gtm,  gLOl  and  Id.  For  this  family  of  densities,  the 
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ARE 


Fig.  3.9.  Performance  of  gtm  and  g^Q  relative  to  the  linear 
detector  Ld  for  various  exponents  c  in  the  generalized  Gaus¬ 
sian  density. 


c 
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Pig.  3  10.  The  locally  optimal  nonlinearities  gL0  and  subop- 
timal  nonlinearities  for  two  members  of  the  Johnson  Su  fam¬ 
ily.  The  nonlinearity  outputs  are  scaled  for  comparison  pur¬ 
poses.  For  the  case  (5=1,  the  parameters  of  gtm  are  c=  752, 
v*  —  3.26,  and  x0  =  3  For  (5  =  2,  6  =  1  46,  v *  =  1.88,  and  Xq  =  3. 


Fig.  3.11.  Performance  of  nonlinearities  gtm  and  gio  relative 
to  the  linear  detector  Id,  for  noise  densities  parameterized 
by  6  of  the  Johnson  Su  family.  The  optimal  parameters  c  and 
v*  are  also  given  as  a  function  of  <5. 
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approximation  method  works  quite  well.  Over  the  range  .8  <  d  <  °°t  the 
minimum  of  AREff<m >gw  is  .989,  (occurring  for  6  =  .8).  This  means  that 
only  a  small  performance  penalty  would  be  incurred  if  gtm  were  to 
replace  the  LO  detector.  As  a  final  comment,  it  should  be  observed  that, 
unlike  the  generalized  Gaussian  family,  the  Johnson  Su  family  fulfills  the 
characteristics  of  a  nearly  Gaussian  pdf  given  in  Chapter  2,  since  /c  is 
sharply  peaked,  while  f  6  has  a  Gaussian-like  mode 

Gaussian-Gaussian  e -mixture  Noise 

The  performance  of  gtm  in  a  third  family  of  heavy  tailed  densities  was 
also  investigated.  Here,  the  noise  is  assumed  to  be  modeled  by  the  e- 
contaminated  Gaussian-Gaussian  mixture  density,  written  as 

/«(*)  =  (l-e)/o(*)  +  e/i(x)  (3.17) 

where  / q  represents  the  pdf  of  a  zero  mean  Gaussian  random  variable, 
and  / 1  represents  the  pdf  of  another  Gaussian  random  variable,  with  the 
variance  ratio  of/  Oq  large.  The  parameter  e  controls  the  degree  to  which 
/  i  contaminates  the  nominal  density  /0,  and  is  typically  taken  to  be 
small.  Figure  3.12  shows  a  comparison  between  two  LO  nonlinearities  and 
their  corresponding  approximations.  The  approximate  nonlinearities  g ^ 
do  not  appear  as  close  to  giQ  in  this  example  as  for  the  Johnson  Sw  family 
for  two  reasons:  first,  the  tails  of  g^Q  for  the  Gaussian-Gaussian  e-mixture 
increase  almost  linearly,  while  gtm  is  constrained  to  have  power  law  tails. 
Second,  gio  has  a  total  of  four  local  extrema,  while  gtm  is  designed  to 
have  a  maximum  of  two.  On  the  other  hand,  gL0  for  f  6  has  two  local 
extrema,  and  the  tails  asymptotically  approach  the  x-axis. 
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Flg.  3.12.  The  locally  optimal  nonlinearities  gi0  and  subop- 
timal  nonlinearities  gim  for  two  members  of  the  Gaussian- 
Gaussian  e-mixture  family,  (a)  e  =  .05,  af/ao=5,  c  =  1.54, 
v*  =  .957,  i0  =  3.  (6)  e  =  .20,  af/a^  =  20,  c  =  -  196,  v*  =  .821, 
x0  =  3 
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The  performance  of  gtm  was  computed  for  a  range  of  values  of  c  and 
O\/O0  suggested  by  Vastola  [18,19]  as  being  representative  of  physical 
noise  situations.  Figures  3.13  and  3.14  present  the  performance  of  the 
tail  matching  method.  The  sets  of  curves  indicate  that  the  tail  matching 
algorithm  generates  nonlinearities  which  work  quite  well  relative  to  the 
optimal  detector  in  Gaussian-Gaussian  e-mixture  noise 

The  results  show  that  it  is  often  possible  to  achieve  nearly  optimal 
performance  using  this  simple  approximation  method.  The  salient 
feature  of  the  noise  tail  matching  method  is  its  ability  to  adjust  tail 
behavior  in  accordance  with  simple  observations  of  the  noise  tail  heavi¬ 
ness. 

Simulation 

To  see  how  well  this  system  might  work  in  practice,  some  actual  phy¬ 
sical  noise  was  used  to  drive  the  system.  The  noise  was  collected  under¬ 
neath  the  Arctic  ice  pack,  and  details  may  be  found  in  [22  .  A  summary 
of  the  data  selected  for  simulation  purposes  is  given  in  Appendix  2.1  of 
this  thesis.  The  noise  data  is  highly  nonstationary;  a  background  Gaus¬ 
sian  noise  is  abruptly  interrupted  with  segments  of  a  high  variance  noise 
generated  during  cracking  of  the  ice  pack. 

To  get  a  more  nearly  stationary  noise  for  driving  the  system,  the  data 
in  each  block  was  adjusted  to  zero  mean  and  randomly  permuted, 
thereby  simulating  the  output  of  a  stationary  noise  source.  This  adjust¬ 
ment  was  necessary  solely  to  improve  the  rate  of  convergence  of  the  sto¬ 
chastic  approximation  algorithm  for  obtaining  u*.  Figures  3.15  and  3.16 
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Pig.  3.13.  Performance  of  gtm  relative  to  the  linear  detector 
Id  in  Gaussian-Gaussian  e-mixture  noise  for  various  values  of 
e  and  a  range  of  variance  ratios  of/ of. 


Pig.  3.14.  Performance  of  gtm  relative  to  gL0  in  Gaussian- 
Gaussian  e-mixture  noise  for  various  values  of  e  and  range  of 
variance  ratios  of/ of.  Curves  are  approximate  due  to 
numerical  roundoff  errors. 
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Jig.  3.15.  Sample  Arctic  under-ice  noise  data,  record  2220, 
adjusted  to  zero  mean.  Vertical  scale  is  in  standard  devia¬ 
tions  from  the  mean. 


sample 

value 


Fig.  3.16.  Sample  Arctic  under-ice  noise  data,  record  2220, 
adjusted  to  zero  mean,  and  randomly  permuted.  Vertical 
scale  is  in  standard  deviations  from  the  mean. 
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present  a  sample  block  of  data,  before  and  after  random  permutation, 
respectively. 

This  noise  was  used  as  the  input  for  the  tail  matching  algorithm.  A 
threshold  T  -3u  was  chosen  for  estimating  the  tail  probability  of  the 
noise,  where  o  is  the  standard  deviation  of  noise  data  block.  As  more 
noise  data  is  observed,  the  cumulative  tail  probability  estimate  converges 
to  the  true  tail  probability.  The  exponent  c  was  estimated  from  this 
cumulative  estimate  of  PT.  The  simulated  system  had  no  knowledge  of 
the  true  density  generating  the  noise  observations;  therefore,  the  Kiefer- 
Wolfowitz  stochastic  approximation  method  was  used  to  find  the  value  of 
v*  which  maximized  ri{gtm )  for  x0=3.  The  convergence  rate  towards  v* 
is  fixed  by  the  particulars  of  the  stochastic  approximation  algorithm,  and 
no  formal  attempts  were  made  to  optimize  its  performance. 

Figure  3.17  shows  the  running  estimate  of  c  and  u  as  a  function  of 
sample  number,  and  Fig.  3.18  shows  the  estimated  value  of  ARE_  w  for 

ytm 

each  block  of  1024  samples.  Since  the  true  distribution  of  the  noise  is 
unknown,  rj(9tm )  was  calculated  by  evaluating  (3.9)  using  the  empirical 
distribution  of  the  data  block  under  consideration  and  the  current  esti¬ 
mate  of  gtm.  The  estimate  of  ARE  results  when  ri(gtm )  is  multiplied  by 
the  variance  of  the  noise  data  block. 

At  the  end  of  the  simulation,  it  was  assumed  that  the  parameters  of 
gtm  were  as  near  optimal  as  possible.  These  final  values  are  given  in  Fig. 
3.19  It  was  desired  to  compare  the  performance  of  gtm  to  the  perfor¬ 
mance  of  the  linear  detector.  To  do  this,  (3.9)  was  evaluated  using  the 
final  estimate  of  gtm  and  the  empirical  distribution  of  the  50  blocks  of 


Fig.  3.17.  Parameters  c  and  v*  for  each  selected  Arctic 
under-ice  noise  block  of  1024  samples. 


Fig.  3.18.  The  estimated  performance  of  gtm  relative  to  the 
linear  detector  for  each  selected  Arctic  under-ice  noise  data 
block.  The  parameters  of  gim  are  those  given  in  Fig.  3. 17. 
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Fig.  3.19.  The  final  estimated  nonlinearity  gtm  at  end  of  Arc¬ 
tic  under-ice  noise  data  simulation.  The  final  parameters  are 
6  =  1.13,  v*=  1.35,  x0  =  3. 


-81- 


1024  noise  samples,  and  the  result  was  multiplied  by  the  variance  of  the 
entire  data  set,  yielding  ARE^  w  =  1.39  as  a  performance  estimate. 

Because  the  true  distribution  is  unknown,  gL0  cannot  be  found,  and  it 
is  not  possible  to  calculate  the  performance  of  the  LO  detector  However, 
it  is  possible  to  conclude  that  the  tail  matching  procedure  was  able  to 
adapt  the  suboptimal  ZNL  in  a  constructive  way,  for  gtm  shows  improved 
performance  over  the  linear  detector. 

5.  Examples  -  Efficacy  Maximization  Algorithm 
Laplace  Noise 

Figure  3.20  provides  a  representative  example  of  g2i  when  its  param¬ 
eters  are  estimated  assuming  a  Laplace  density  for  the  noise.'  The  LO 
detector  in  this  case  is  a  sign  detector,  sd.  Intuition  might  suggest  that 
the  best  approximation  employing  two  linear  regions  and  fixed  nonzero 
breakpoint  is  the  amplifier-limiter  oZ(x;a>0)  =  <72i(x;a>0,6=0)/a,  but 
this  turns  out  not  to  be  the  case.  The  best  performance  is  obtained  when 
<7;>i  has  tails  that  return  to  meet  the  x-axis.  Fig  3.21  compares  the  per¬ 
formance  of  g 21  >  sd,  and  a l  in  terms  of  their  ARE  relative  to  the  linear 
detector  Ld.  Note  that  gzl  has  improved  performance  over  both  the 
linear  detector  and  the  amplifier-limiter  detector  for  any  choice  of  a^O. 
Also,  when  a-»0,  both  al  and  flr2£ / a.  approach  the  form  of  sd,  and  their 
performances  converge  towards  the  optimal  performance  of  sd. 

For  each  particular  value  of  a,  the  optimal  tail  slope  6  *  was  found  by 
iterative  application  of  (A3. 6)  and  (3.2).  Convergence  was  typically  rapid, 
often  requiring  3  iterations  for  a  change  of  less  than  .001  in  6  *. 
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Fig.  3.20.  Comparison  of  the  nonlinearity  g^x  and  the  sign 
detector  sd  for  a  =  l  and  6  ^  =  — .211.  The  output  of  g %  is 
scale  for  comparison  with  sd. 
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Fig.  3.21.  Performance  of  the  nonlinearity  g2l,  the  amplifier 
limitier  aL,  and  the  sign  detector  sd  relative  to  the  linear 
detector  Id  for  Laplace  noise  and  various  breakpoints  a.  Tail 
slope  6  *  is  optimal  for  each  choice  of  a. 
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Gaussian-Gaussian  e-mixture  Noise 

Here  we  consider  the  performance  of  gzl  in  the  presence  of 
Gaussian-Gaussian  e-mixture  noise  A  representative  example  of  gio  and 
g&  is  given  in  Fig.  3.22  for  f  E  with  parameters  chosen  in  the  middle  of 
the  range  suggested  by  Yastola  [18,19]  as  being  reasonable  for  observed 
cases  of  Middleton’s  Class  A  noise  density. 

At  the  end  of  Section  3  a  technique  was  described  for  finding  the 
estimated  optimal  breakpoint  a*.  It  was  employed  for  this  example  by 
choosing  the  3  points  =  5qr9;  a2  =  g9;  and  a3  =  2g  9,  with  q  9  mean¬ 
ing  the  .9  quantile  of  the  distribution.  In  practice,  these  quantities  are 
easily  measured  characteristics  of  a  noise  distribution.  For  the  particu- 
lar  example  of  Gaussian-Gaussian  e-mixture  noise,  it  was  found  that  a* 
was  typically  within  5%  of  the  true  value  of  a*,  and  the  efficacy  of  gzl 
using  a*  was  within  1%  of  the  maximum  possible  efficacy  of  g#.  Further, 
the  estimate  a*  was  stable  for  different  choices  of  [ax,az,azl.  Note  that 
for  this  example  the  true  values  of  a*  were  available  only  through  compu¬ 
tationally  burdensome  numerical  methods. 

Given  the  estimated  optimal  breakpoint  a*,  the  slope  b*  was  found 
iteratively,  as  before.  Convergence  of  b*  to  within  a  .001  change 
occurred  typically  within  6  iterations.  Figure  3.23  shows  ARE_  ld  for 

various  combinations  of  c  and  of.  Figure  3.24  compares  the  performance 
of  gzl  and  gio,  where  it  may  be  seen  that  the  performance  of  gZi  is  within 
a  few  percent  of  the  optimal,  and  at  worst,  within  4%.  The  relatively 
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Fig.  3.22.  Representative  nonlinearities  g&  and  gL0  for 
Gaussian-Gaussian  e-mixture  noise  with  e  =  .l,  a?/  oq  =  750, 
a  =2  5  and  b  *=-.524.  The  output  of  g%i  is  scaled  for  com¬ 
parison. 


Fig.  3.23.  Performance  of  the  nonlinearity  g#  relative  to  the 
linear  detector  Ld  in  Gaussian-Gaussian  e-mixture  noise  for 
various  values  of  e  and  range  of  variance  ratios  o\/  <Jq. 
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Fig.  3.24.  Performance  on  the  nonlinearity  gZi  relative  to  the 
locally  optimal  nonlinearity  gio  for  Gaussian-Gaussian  e- 
mixture  noise,  for  various  values  of  c  and  range  of  variance 
ratios  af/ Uq. 
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poorer  performance  occurs  for  the  larger  values  of  c  which  assign  more 
probability  mass  to  a  region  away  from  the  origin.  There  the  tails  of  the 
LO  nonlinearity  diverge  from  the  x-axis,  while  the  approximate  nonlinear¬ 
ity  g 21  has  truncated  tails  in  this  region.  As  a  result,  its  performance 
suffers  slightly. 

Simulation 

As  was  done  for  the  tail  matching  algorithm,  the  Arctic  under-ice 
noise  data  was  used  to  examine  the  performance  of  the  efficacy  maximiz¬ 
ing  procedure.  The  same  50  high-kurtosis  data  blocks  used  previously 
and  described  in  Appendix  2.1  were  used  to  drive  the  algorithm,  after 
each  data  block  was  adjusted  to  zero  mean.  Unlike  the  simulation  in  Sec¬ 
tion  4,  no  further  manipulation  was  necessary  to  prepare  the  data. 

Here  the  .9  quantile  of  the  noise  distribution  was  estimated  for  each 
data  block  as 


9.9 


9.9  9 , i 

2 


(3.18) 


to  minimize  the  effects  of  the  high  skew  occasionally  observed.  The  3- 
point  parabolic  fitting  method  was  used  to  estimate  a*,  with  aj  =  .5q9; 
9.9  =  a2>  and  .9  =  0,3  serving  as  the  three  arbitrary  breakpoints.  A  minor 
modification  to  the  algorithm  was  made,  requiring  that  aj<S.*<a3.  Any 
a.*  outside  this  range  was  replaced  by  a,j  or  0.3,  as  appropriate.  The 
modification  ensures  that  the  algorithm  does  not  produce  highly  inaccu¬ 
rate  values  of  a*  when  the  interval  [aj,a3]  does  not  bracket  the  true 
value  a*.  The  estimated  values  a*,  and  ultimately,  the  performance  of 
9zl  were  insensitive  to  using  q  a5  or  q  ^  instead  of  q  0. 
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Once  a  *  was  found  for  each  block,  b*  was  found  using  the  previously 

discussed  iterative  procedure.  Figure  3.25  shows  the  estimated  values  a* 

and  b  *  for  each  of  the  data  blocks  in  the  simulation.  Note  that  both  of 

the  parameters  appear  to  have  fairly  steady  nominal  values.  For  each 

data  block,  ri(gzi)  was  estimated  by  taking  the  current  value  of  a  *  and  b  * 

and  evaluating  (3.14)  or  (3.15),  as  appropriate,  with  respect  to  the  noise 

data  block  empirical  distribution.  Multiplying  this  result  by  the 

estimated  variance  of  the  data  block  yields  an  estimate  of  ARE„  ;w, 

9zi'ta 

shown  in  Fig.  3.26 

At  the  end  of  the  simulation,  the  average  values  taken  by  a*  and  b* 

were  computed,  and  are  given  in  Fig.  3.27  along  with  a  depiction  of  gzl 

using  the  average  values.  Again,  if  these  "final"  parameter  values  are 

used  in  (3  15)  for  the  entire  58  blocks  of  1024  noise  samples,  and  the 

result  is  multiplied  by  the  overall  noise  variance,  ARE„  =  139  is 

y  2L 

obtained  as  an  estimate  of  performance.  Surprisingly,  this  is  exactly  the 
same  result  as  the  tail  matching  algorithm  overall  performance. 

6.  Conclusion 

The  conclusion  to  be  drawn  from  this  study  is  that  it  is  possible  to 
implement  adaptive  detector  nonlinearities  using  fairly  simple  tech¬ 
niques. 

Tail  Matching 

Of  the  two  methods  suggested,  the  first,  utilizing  an  estimate  of  tail 
behavior,  is  quite  simple:  let  the  tails  of  the  subop timal  ZNL  be  the  tails 
of  the  locally  optimal  nonlinearity  for  a  density  with  the  same  tail 
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Fig.  3.25.  Estimated  parameters  a *  and  b*  of  the  nonlinear¬ 
ity  gzi  for  each  of  the  selected  Arctic  under-ice  noise  data 
blocks. 


Fig.  3.26.  The  estimated  performance  of  gZi  relative  to  the 
linear  detector  Id  for  each  of  the  selected  Arctic  under-ice 
noise  data  blocks.  The  parameters  of  g#  are  those  given  in 
Fig.  3.25. 
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probability  mass  as  the  observed  noise  distribution.  Apparently  PT  con¬ 
veys  enough  information  about  tail  behavior  of  the  noise  density  and 
fairly  good  performance  results  using  even  a  crude  approximation  to  the 
true  ZNL  tails.  A  more  sophisticated  estimate  of  c  might  improve  the 
performance  gtTn,  as  might  a  different  choice  altogether  for  the  class  of 
density  tails  used  for  gtm  It  would  be  interesting  to  discover  how  much 
additional  complexity  any  resulting  performance  gain  could  justify. 

Other  related  approaches  were  recently  explored  by  Modugno  [17]. 
Some  work  done  by  Miller  and  Thomas  [9]  in  approximation  of  LO  non- 
linearities  suggests  that  even  very  simple  approximants  of  the  optimal 
nonlinearity  have  the  potential  to  achieve  performance  which  is  accept¬ 
ably  near  the  optimal. 

Efficacy  Maximization 

The  second  method  suggests  maximizing  the  performance  of  a  simple 
generic  nonlinearity.  The  approximate  detector  ZNL  consists  of  a  central 
linear  region  of  unity  slope  surrounded  by  linear  tail  regions,  generally  of 
different  slope.  A  closed  form  expression  (A3. 6)  for  the  tail  slope  is  given, 
and  a  method  is  suggested  for  determining  approximately  the  appropri¬ 
ate  breakpoint  between  central  and  tail  regions  in  the  ZNL.  The  tail  slope 
is  either  known  exactly  after  a  single  application  of  (A3. 6),  or  after  a  few 
iterations  using  (A3. 6). 

Some  examples  show  that  the  performance  of  the  suboptimal  ZNL 
compares  well  with  the  LO  detector  performance,  at  least  when  the  par¬ 
tial  moments  of  the  noise  density  are  known  exactly.  In  the  examples, 
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the  performance  is  very  good,  usually  within  a  few  percent  of  the  optimal 
performance.  In  practice,  the  performance  of  <7a  may  not  be  quite  so 
good,  since  at  best  only  estimates  of  the  partial  moments  would  be  avail¬ 
able.  However,  these  partial  moments  are  easily  estimated,  since  the 
highest  order  is  second  degree.  Also,  each  integration  typically  spans  a 
region  containing  a  nontrivial  amount  of  probability  mass;  therefore  it 
should  be  fairly  easy  to  converge  quickly  to  low  variance  estimates  of  the 
partial  moments.  The  issue  of  sensitivity  of  77 (gZi)  as  a  function  of  errors 
in  the  partial  moments  has  not  been  examined  at  this  time. 

The  advantage  of  this  method  is  that  implementation  of  the  proposed 
nonlinearity  is  quite  simple;  all  that  is  required  is  the  ability  to  apply 
different  linear  gains  (plus  constant  offsets)  to  inputs  occurring  along 
different  regions  of  the  x-axis  As  a  result,  adaptation  of  gzl  can  be 
accomplished  with  little  overhead,  once  a*  and  fa  *  are  known 

The  chief  disadvantage  of  this  method  is  the  fact  that  negative  values 
of  fa  *  must  be  found  iteratively.  However,  intermediate  values  of  fa  *  are 
useful;  the  performance  of  gzi  is  not  maximal,  but  it  is  nearly  so,  and  the 
performance  improves  monotonically  with  each  iteration. 

Another  complication  is  the  fact  that  an  explicit  solution  for  a*  is  not 
available.  The  parabolic  fitting  method  mentioned  may  be  used  to  esti¬ 
mate  a*,  or  other  methods  may  be  used  to  converge  to  the  best  value 
On  the  other  hand,  it  appears  that  precise  placing  of  the  breakpoint  is 
not  a  critical  matter. 

The  parabolic  fitting  procedure  for  finding  the  breakpoint  is  also 
applicable  to  finding  the  appropriate  scale  factor  to  the  input  of  a  ZNL. 
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Instead  of  performing  a  Kiefer-Wolfowitz  stochastic  approximation  to  find 
the  optimal  scale  factor,  the  parabolic  fitting  method  could  be  adapted 
for  solving  the  scaling  problem.  The  quantities  involved  in  the  parabolic 
fitting  method  are  expectations  of  the  noise  observations  transformed  by 
the  square  or  first  derivative  of  the  ZNL.  For  heavy-tailed  noises,  the  non¬ 
linearity  tails  allow  the  very  large  noise  observations  to  have  much  less 
influence  than  if  they  were  untransformed  or  linearly  processed;  there¬ 
fore,  the  large  observations  contribute  very  little  to  the  computed  expec¬ 
tation  of  the  square  and  first  derivative  of  the  nonlinearity.  Intuition  sug- 
gests  that  this  inherently  might  be  a  more  robust  procedure  than  com¬ 
puting  variance  through  expectation  of  the  squared  noise  observations. 

Comparison  of  Algorithms 

The  performance  of  gtm  and  gzl  may  be  compared  by  computing 
ARE_  _  under  identical  noise  situations.  Because  of  the  appeal  of  /. 

as  a  reasonable  model  for  certain  observed  noise  densities,  it  will  be  used 
as  the  standard  for  comparison.  Figure  3.28  presents  ARE^  ,  where 

it  may  be  seen  that  there  is  some  advantage  to  the  efficacy  maximizing 
algorithm  giving  gZL  This  should  not  be  surprising,  since  the  algorithm 
for  9tm  is  an  "open  loop"  procedure  which  does  not  optimize  the  ZNL 
shape.  (Both  algorithms  optimize  scale  with  respect  to  efficacy.) 
Further,  may  be  regarded  as  having  simpler  shape  than  gtm,  for  it  is 
piecewise  linear,  while  has  power  law  tails  and  a  polynomial  central 
region. 

As  was  noted  at  the  end  of  the  simulations,  the  estimated  perfor- 
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Fig.  3.28.  Performance  of  the  nonlinearity  gZi  relative  to  the 
nonlinearity  gtm  in  Gaussian-Gaussian  e-mixture  noise  for 
various  c  and  range  of  variance  ratios  of/  erf. 


Fig.  3.29.  The  estimated  performance  of  the  nonlinearities 
02z  (solid  line)  and  g ^  (broken  line)  relative  to  the  linear 
detector  Id  for  each  of  the  selected  Arctic  under-ice  noise 
data  blocks.  The  parameters  of  the  nonlinearities  are  the 
values  current  for  each  noise  block. 
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mance  improvement  was  1.39  for  either  detector  relative  to  a  linear 
detector.  However,  the  value  of  ARE^  ^  was  obtained  for  a  single  fixed 

pair  of  parameters  (a*,b  *).  If  a  *  and  b  *  are  allowed  to  vary  as  the  noise 
statistics  change  from  block  to  block,  Fig.  3.29  illustrates  that  may 
have  a  slight  advantage  over  gtm .  The  reason  for  this  is  that,  while  the 
adaptation  algorithm  for  gtm  forces  the  parameters  to  converge,  tie 
algorithm  for  gzl  does  not  include  any  memory  of  the  parameters  from 
the  previous  noise  data  block.  Thus,  the  adaptation  of  g a  may  be  con¬ 
sidered  as  more  agile. 

Both  algorithms,  to  some  extent,  are  ad  hoc.  The  purpose  in  explor¬ 
ing  these  methods  was  not  to  find  a  definitive  algorithm  for  designing 
nearly  optimal,  but  simple,  detector  nonlinearities.  Instead,  the  objec¬ 
tive  was  to  gain  insight  into  this  problem  The  conclusion  is  that  even 
relatively  unsophisticated  ZNL  design  techniques  have  potential  for  highly 
successful  application,  provided  the  design  algorithm  has  available  some 
knowledge  of  the  noise  density  tail  shape. 
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Appendix  3. 1 

We  seek  to  maximize  (3.14)  with  respect  to  6,  and  begin  with  some 
notational  preliminaries.  The  first  step  is  to  expand  the  denominator  to 
obtain 


V  = 


ff  +  bff 


w  ^  n 

fx2f  +  b2fx2f  +  2ab  ( 1  — 6 )  f xf  +  a2(l  -b)2f / 


(A3. 1) 


For  convenience,  we  rewrite  (A3.1)  as 

_  _ 2[7 i+fe/2]2 _ 

/3+62/4+2a6  (1-6  )75+cl2{1-6  )2/2 

Taking  the  partial  derivative  of  (A3. 2)  with  respect  to  6  ,  we  obtain 


dj2  _ 

4(71+672)  ' 

v 

6  (a/27 5~a27|  -7 174+2a7 XI 5 -a2/^) 

56  ~ 

d2  j 

A 

+U2^3+a2^l“a7i75+a27j72) 

(A3. 3) 


with  d  representing  the  denominator  of  (A3. 2).  The  notation  is  further 
simplified  by  rewriting  (A3. 3)  as 


djj  _ 
db 


4(7 1+6/2) 

d2 


bC+D 


(A3. 4) 


A  necessary  condition  for  (A3.1)  to  have  a  maximum  is  that  &-=  0  has  a 

56 

solution  for  some  6  .  A  solution  always  exists,  since  the  roots  of  (A3. 4)  are 
the  pair  j(-7j/ IZ)X~D/  C) \  =  (60,6:|. 

A  sufficient  condition  for  a  maximum  to  occur  at  6#e{60,61|  is 
<  0.  It  is  not  necessary  to  evaluate  the  second  derivative;  all 

6* 


a2n 

d62 
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that  is  required  is  to  note  that  the  numerator  of  the  first  derivative  is 

quadratic  in  b ,  and  the  denominator  is  positive  for  all  6 .  Although  -^Z-is 

db 


d2v 

not  parabolic,  — Has  the  same  sign  at  6*  as  the  slope  of  the  quadratic 
db* 


numerator  evaluated  at  b  *,  one  of  its  roots.  Therefore,  the  root  of  (A3. 3) 
giving  maximum  efficacy  will  be  located  on  the  negative  sloping  branch  of 
the  numerator  parabola.  Taking  this  into  account,  and  the  fact  that,  if 
C  >  0,  the  smaller  root  is  on  the  negative  sloping  branch,  and  if  C  <  0,  the 
larger  root  is  on  the  negative  sloping  branch,  the  solution  of  (A3  4)  which 
maximizes  (A3.1)  is 


max  for  C <  0 

min£60,6jj  forC>0  (A3. 6) 


where 

60  =  —I\/  1 2 


and 


/2/3+q3/f  -aLJ^ta^Ijlz 

0.I2I 5~o.2I 2  —/ 1/4+2^75/5- q3/j/2 


h  =  /f 

/4  = 


DB 

h=fl 

a 

00 

h = /*/ 


h=  f*zJ 
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Appendix  3.2 

Assume  for  the  moment  that  the  truncated  range  of  integration  is 
ignored  and  (A3. 6)  is  used  directly  to  maximize  (3  14)  resulting  in  6#<0. 
The  effect  of  ignoring  the  tail  truncation  at  ±xy  is  equivalent  to  allowing 
the  nonlinearity  gv  to  have  virtual  tails  like  those  illustrated  in  Fig.  3  7. 
Additional  nonzero  tails  in  gv  are  an  artifact  of  the  improper  range  of 
integration,  and  as  a  result,  (A3.6)  optimizes  gv  instead  of  gzl.  This 
affects  the  final  result  b  * ,  since  the  detector  using  gv  would  not  perform 
as  well  as  one  using  g^,  the  truncated  version.  A  simple  argument 
explains  why:  Due  to  the  additional  tail  area,  Eg  V  <  Eg'zi,  and  E g?  >  E g£i. 
Combining  these  two  facts,  we  find 


oo 

oo 
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□0 

yw 

—  00 

fsff 

—  00 
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fdzif 

— CD 

f 9  21  f 

— 00 

For  fixed  and  equal  false  alarm  rates,  gv  will  have  lower  power  of  detec¬ 
tion  than  g%i  asymptotically  as  the  number  of  samples  grows  large  [14,  p. 
228],  A  weak  sufficient  condition  for  the  inequality  (A3. 7)  to  hold  under 
squaring  is 

a  oo 

6  *  ~lf/  fS  (A3. 8) 

0  a 

This  lower  bound  on  6  is  bQ  from  Appendix  I.  Since  6  *£[60,6^  and 
bx>b 0,  the  condition  (A3.8)  is  satisfied  for  C<0  and  rj(gT)^T}(gv).  For 
OO  it  should  be  possible  to  prove  the  observation  that  60^6^  making 
the  squared  inequality  true  for  this  case  also. 

A  rigorous  proof  of  the  inequality  (A3. 8)  under  squaring  may  be  found 
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by  considering  the  effect  of  the  virtual  tails  in  the  denominator  terms  of 
(A3. 7)  in  addition  to  their  effect  in  the  numerator  terms  only,  as  was  done 


here. 
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4 


Signal  Detection 
in  Bursts  of  Impulsive  Noise 


The  previous  chapter  was  concerned  with  the  development  of  a  fixed 
detector  nonlinearity  which  could  adapt  to  the  noise  statistics  of  the  par¬ 
ticular  environment.  There  it  was  assumed  that,  over  short  periods  of 
time,  the  statistics  were  nearly  stationary.  Another  approach  to  the 
detection  problem  is  given  in  this  chapter,  where  it  is  assumed  that  the 
noise  statistics  can  change  abruptly. 

The  fundamental  idea  explored  is  that,  if  the  abrupt  changes  in  the 
noise  can  be  recognized,  a  detector  may  use  this  knowledge  to  achieve 
improved  performance  with  respect  to  a  detector  whose  structure  is 
based  upon  an  assumption  of  nearly  stationary  noise  statistics.  Section  1 
provides  the  background  and  motivation  of  the  problem.  Section  2 
develops  a  model  for  noise  with  abruptly  changing  statistics;  specifically, 
the  case  of  a  Gaussian  background  noise  interrupted  by  bursts  of  an 
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impulsive  contaminant  noise  is  considered.  A  detector  structure  is  pro¬ 
posed,  and  its  performance  is  analyzed.  Section  3  examines  the  problem 
of  distinguishing  between  impulsive  bursts  and  the  background  noise. 
The  proposed  detector  and  impulsive  burst  recognition  algorithm  are 
simulated  in  Section  4,  and  a  few  concluding  comments  are  given  in  Sec¬ 
tion  5 


1.  Introduction 

Considerable  attention  has  been  paid  to  the  problem  of  recognizing 
sudden  changes  in  the  stochastic  environment  of  a  system.  Basseville  [l] 
and  Willsky  [2]  summarize  some  of  the  techniques  which  have  been 
developed.  One  approach  in  treating  this  problem  involves  the  use  of 
characterizations  that  allow  for  abrupt  changes  in  the  noise  statistics.  In 
some  simple  cases,  the  noise  model  consists  of  two  distinct  density  func¬ 
tions,  each  describing  a  unique  mode  of  noise  generation.  During  nono¬ 
verlapping  time  intervals,  one  of  the  pair  is  considered  to  be  the  particu¬ 
lar  valid  description  of  the  noise  density. 

Fig  4. 1  illustrates  a  conceptual  representation  of  this  situation.  Only 
the  sequence  may  be  observed.  The  sequence  fe^  chooses  between 
n0;i  and  n1;i  on  a  sample-by-sample  basis.  While  fei  j  cannot  be  observed 
directly,  it  may  be  possible  to  construct  an  estimate  of  it  by  observing 
the  behavior  of  (n^.  A  usual  assumption  is  that  e*  does  not  switch  "too 
rapidly";  loosely  speaking,  after  switching  into  a  new  state,  e*  tends  to 
stay  there  for  a  while  It  is  this  property  that  allows  an  observer  to  dis¬ 
tinguish  between  the  two  noise  modes.  This  assumption  is  clarified 


further  in  Section  3. 
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f  0 


Fig.  4.1.  A  representation  of  the  dual  mode  noise  generation 
mechanism. 
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For  a  physical  example  of  a  noise  with  abruptly  changing  statistics, 
consider  the  case  of  the  noise  environment  under  the  Arctic  ice  pack  [3]: 
For  most  of  the  time,  the  noise  appears  to  be  Gaussian.  Occasionally, 
though,  ice  cracking  occurs,  and  a  short  burst  of  a  relatively  high  vari¬ 
ance  noise  is  observed.  After  the  cracking  event  is  complete,  the  noise 
returns  to  a  nominal  low  variance  Gaussian  mode.  Fig.  4.2  reveals  the 
distinctive  difference  in  the  behaviors  of  the  two  noise  modes.  This  type 
of  noise  may  be  described  as  an  impulsively  contaminated  Gaussian 
noise. 

Various  statistical  models  have  been  proposed  for  describing  a  noise 
environment  that  is  nominally  Gaussian  with  an  additive  impulsive  noise 
component.  As  was  discussed  in  Chapter  2,  these  models  often  take  the 
form  of  univariate  pdf’s  that  are  heavy-tailed  relative  to  the  Gaussian  pdf 
[e.p.,  10,11,16,17],  Implicit  with  the  use  of  a  univariate  noise  model,  how¬ 
ever,  is  the  assumption  that  the  noise  statistics  are  stationary  at  least 
over  the  interval  of  interest.  The  Arctic  under-ice  noise  is  a  counterex¬ 
ample  to  this  assumption,  smce  over  the  short  term  the  noise  statistics 
appear  to  be  nonstationary.  The  impulsive  noise  occurs  in  bursts,  and  a 
nonstationary  model  for  the  noise  seems  more  appropriate  than  some 
fixed  model. 

It  may  be  possible  to  find  a  multivariate  noise  distribution  which  ade¬ 
quately  describes  a  background  Gaussian  noise  with  bursts  of  an  impul¬ 
sive  contaminant.  Unfortunately,  finding  multivariate  non-Gaussian  noise 
models  is  in  general  a  complicated  problem,  even  in  fairly  straightfor¬ 
ward  situations.  See,  for  example,  [12-15].  Furthermore,  complicated 
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Fig.  4.2.  Time  domain  plot  of  sample  Arctic  under-ice  noise 
data  record  2220.  Vertical  scale  is  in  standard  deviations 
from  the  mean 
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multivariate  noise  distributions  may  lead  to  unacceptably  complicated 
optimal  detector  structures 

When  a  heavy  tailed  noise  has  a  burst-like  structure  as  in  Fig.  4.2,  it 
is  reasonable  to  develop  a  detector  that  recognizes  the  dual-mode  nature 
of  the  noise  and  adapts  rapidly  to  the  particular  operative  mode  The 
purpose  of  this  chapter  is  to  illustrate  the  potential  advantage  of  such  a 
switched  burst  ( SB )  detector. 


2.  Switched  Burst  Detector 

We  shall  restrict  attention  to  the  discrete  time  locally  optimal  (LO) 
detection  of  a  known  constant  signal  in  a  Gaussian  background  noise  con¬ 
taminated  by  bursts  of  impulsive  noise  All  the  noise  samples  are 
assumed  independent,  but  not  necessarily  identically  distributed.  In 
more  precise  form,  the  problem  is  to  observe  x=xl,  i  =  \,2,  .  .  .  ,  M  and 
decide  between 

Hq\  x  =  n 
H\.  x  =  n+0s 

where  n  =  n1,  i  -  1,2 . M  and  s  is  a  known  constant  signal  s  of  length 

n  and  nonzero  amplitude  parameter  6.  As  is  well  known  [5],  the  LO 
detector  test  statistic  in  the  case  of  white  noise  is  any  monotone  function 
of 


lnAtli"eS) 


*T,ae 


e=o 


where  ft  is  the  univariate  density  of  The  term  being  summed  is  a 

f .  *(x  • ) 

memoryless  nonlinearity  gi(xi)=  -  y  ^  . 

fi  vxi ) 
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Stationary  Impulsive  Uodel 

One  common  empirical  model  of  impulsively  contaminated  Gaussian 
noise  is  the  Gaussian-Gaussian  e-mixture  density,  which  may  be  written 
as 


fs(x)  =  (l“c)/o(*)+c/i(*)  (4.1) 

Here  /  q  represents  a  background  low  variance  Gaussian  noise,  and  /  j 
represents  a  high  variance  (impulsive)  Gaussian  component.  Any  particu¬ 
lar  observation  x  is  generated  by  the  impulsive  component  with  probabil¬ 
ity  e.  Vastola  [4]  recently  suggested  that  (4  1)  is  also  a  useful 
simplification  of  Middleton’s  Class  A  model  [10,11]  for  impulsive  noise 
environments. 

Using  (4.1)  as  the  univariate  density  of  the  noise,  and  assuming  that 
the  noise  samples  are  identically  distributed,  the  LO  detector  nonlinear¬ 
ity  is  fixed  for  all  samples  as 


^  i2  fo(x)+-^-fl(x) 

go _ £  i 

U~e)/o(*)+e/i(*) 


(4.2) 


For  convenience,  the  overall  noise  variance  is  assumed  to  be  unity 


Switched  Burst  Nonstationary  Model 

Consider  the  nonstationary  noise  density 

I £0(3*1  >®i)  —  (l  oOFi  )^’®it/'  1(3^)  (4-3) 

for  i  =  1,2 .  Here,  e*  takes  on  the  value  0  or  1,  and  /0  and  /j  are  two 

arbitrary  densities,  which  are  not  necessarily  Gaussian.  When  e*  is  zero, 
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the  noise  is  in  the  background  mode ,  and  the  observed  noise  has  density 
f  o-  When  ei  is  unity,  the  noise  is  in  the  impulsive  mode  and  the  observed 
noise  density  becomes  f  1.  The  sequence  [e^  is  defined  to  have  the  pro¬ 
perty 

1  n 

c  =  lim  TT  S  ei  (4.4) 

The  implication  of  (4  4)  is  that,  over  a  long  observation  period,  e  propor¬ 
tion  of  the  samples  come  from  the  impulsive  noise  mode,  and  (1— e)  pro¬ 
portion  of  the  samples  come  from  the  background  mode.  Noise  samples 
described  by  / sb  may  be  thought  of  as  being  generated  by  the  mechan¬ 
ism  of  Fig  4  1.  In  many  cases  of  interest,  the  background  mode  is  dom¬ 
inant,  and  therefore,  e  is  often  estimated  or  assumed  to  be  small  [4,8,17], 
Note  that  unlike  f  £,  the  density  f  sb  is  nonstationary.  However,  the  noise 
in  each  individual  mode  may  be  considered  stationary,  with  the  sequence 
(ej  controlling  which  mode  is  observed. 

For  the  purpose  of  comparison  with  the  Gaussian-Gaussian  e-mixture 
density,  /0  and  f  1  shall  be  the  same  densities  as  those  composing  / 
One  rationale  for  picking  f0  and  /  j  to  be  Gaussian  is  that  they  are  the 
two  leading  and  most  significant  terms  in  Middleton’s  Class  A  density 
model  [4,24].  Equivalently,  this  assumption  implies  that  the  impulsive 
contaminant  is  itself  a  Gaussian  noise  source.  The  case  of  f  q  and  f  j  both 
Gaussian  shall  be  designated  as  Gaussian-Gaussian  switched  burst  noise 

The  observations  xt  will  continue  to  be  assumed  independent  for  any 
arbitrary  switching  sequence  fe*  The  noise  density  on  a  sample  by  sam¬ 
ple  basis  is  either  / 0  or  / 1>  which  requires  that  the  nonlinearity  used  at  a 


particular  sample  time  is  gQ  or  glt  respectively.  Ideally,  g 0  and  g j  are 
the  locally  optimal  nonlinearities  associated  with  the  two  densities.  The 
test  statistic  becomes 


(4.5) 


i=i 


where 


g 0  for  et  =  0 


9sB  gx  for  =  1 


(4.6) 


In  practice,  would  not  be  known;  instead,  some  additional  structure 
is  required  to  generate  a  sequence  as  an  estimate  of  fe^.  This  prob¬ 
lem  receives  attention  in  Section  3. 

Ideal  Detector  Performance 

In  this  subsection,  expressions  are  given  for  the  performance  of  the 
switched  burst  detector  with  the  assumption  that  the  switching  sequence 
may  be  reconstructed  without  error.  Performance  will  be  analyzed  for 
arbitrary  densities  /0  and  with  arbitrary  nonlinearities  g0  and  gv 
Specific  results  for  Gaussian-Gaussian  switched  burst  noise  with  linear 
detectors  will  also  be  developed.  The  next  subsection  explores  detector 
performance  without  this  ideal  knowledge. 

A  definition  for  the  efficacy  of  an  arbitrary  stationary  detector  g  with 
zero  mean  under  the  noise  /  given  in  [5,6],  and  discussed  in  Chapter  2  is 


(4.7) 


The  case  of  interest  is  nonstationary,  as  the  nonlinearities  composing  gsB 
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swit.ch  in  accordance  with  the  switching  of  the  underlying  noise  densities. 
Appendix  6. 1  demonstrates  that,  for  the  switching  detector  in  the  pres¬ 
ence  switched  burst  noise,  (4.7)  may  be  rewritten  as 


VsBigsB) 


|(l-e)E/op0'  +  £E/,0'i 

+  eE fx9i 


(4.8) 


The  formulation  (4.8)  is  general,  and  does  not  depend  on  the  fact  that  f0 
and  /  j  were  previously  defined  to  be  Gaussian  densities.  For  the  particu¬ 
lar  case  of  the  Gaussian  densities  used  in  (4.1)  and  (4.3),  the  LO  detector 
is  linear  with  slope  <Tq  for  f  g  and  with  slope  o~ ^  for  f  j.  Applying  this 
fact  to  (4.8),  the  expression  for  t)sb  reduces  to 


T)sb 


(4.9) 


Note  that  (4.8)  may  also  be  used  to  evaluate  the  performance  of  g  £  in 
the  presence  of  f  sb  In  this  case,  the  detector  nonlinearity  is  the  same 
whether  the  background  or  impulsive  noise  is  observed;  therefore  we  can 
letg0  =  g1=gs  in  (4.8). 

A  convenient  measure  for  comparing  the  performance  of  two  LO 
detectors  dy  and  dz  is  asymptotic  relative  efficiency,  defined  m  Chapter  2 
as 


AREd1,d2  =  Tld/'nd2  (4.10) 

The  two  nonlinearities  g£  and  gsB  may  be  compared  by  computing  their 
efficacies  and  evaluating  (4.10). 

Fig.  4.3  presents  a  plot  of  ARE^  ^  for  combinations  of  c  and  of, 
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ARE 


CTf/ <?o 


Fig.  4.3.  Performance  comparison  of  fixed  nonlinearity  gt 
and  switched  nonlinearity  gsB  in  Gaussian-Gaussian  switched 
burst  noise  for  various  values  of  i  and  range  of  of/ ctq. 
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from  which  it  is  clear  that  there  is  an  advantage  to  the  switched  detector 
method  over  the  fixed  detector.  An  intuitive  explanation  for  this  is  that 
in  switched  burst  noise,  the  noise  is  always  Gaussian  (though  with  nonsta¬ 
tionary  variance),  and  the  LO  detector  is  always  linear  (with  nonstation¬ 
ary  slope).  Thus,  the  switched  detector  maximizes  efficacy  for  it  is 
always  locally  optimal  at  any  given  sample  time.  On  the  other  hand,  g 
has  two  nearly  linear  regions,  but  it  is  not  the  locally  optimal  detector  for 
Gaussian  noise.  If  the  stationary  density  f  £  is  thought  of  as  a  time  aver¬ 
aged  version  of  f  sb,  then  in  some  sense  the  nonlinearity  g  may  be  inter¬ 
preted  as  an  optimal  stationary  approximation  of  gsB  The  three  non- 
lmearities  g0,  glt  and  g£  are  plotted  in  Fig.  4.4. 

Another  point  not  made  obvious  by  Fig.  4.3  is  that  the  switched 
detector  is  capable  of  large  performance  improvements  over  a  fixed 
linear  detector,  Ld.  A  plot  of  ARE^5B  ^  is  presented  in  Fig.  4.5.  For  very 

large  values  of  of/crfi,  the  switched  detector  has  a  processing  gain  rela¬ 
tive  to  Ld  of  approximately  cuf/aft. 

Non-Ideal  Detector  Performance 

The  previous  subsection  discussed  the  performance  of  the  switched 
detector  under  the  ideal  assumption  of  perfect  knowledge  of  \eil,  and 
that  g o  and  g  j  were  LO  for  /  0  and  f  j,  respectively.  In  any  practical  situa¬ 
tion,  these  assumptions  would  almost  certainly  be  violated.  As  a  result, 
we  now  direct  attention  towards  the  performance  of  gss  when  only  an 
estimate  of  {e*}  is  available.  Also,  the  effects  of  incorrectly  estimat¬ 
ing  the  variance  ratio  in  the  Gaussian-Gaussian  switched  burst  noise  is 
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Fig.  4.4.  The  nonlinearity  gt  compared  to  the  two  linear 
detectors  g0  and  gv  The  slopes  of  g 0  and  g :  are  erf2  and  of2, 
respectively. 


10000 

1000' 

100' 

AREW<* 

10- 

1 


*  *  I  I  I - 1 - 1 - I  I  I  I  l—l—L- 

10  100  1000  10000 

of/ Oo 


Fig.  4.5.  Comparison  of  the  switched  detector  gsB  and  the 
linear  detector  Id  in  Gaussian-Gaussian  switched  burst  noise 
for  various  values  of  t  and  range  of  a2/ a2 
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assessed. 


Appendix  6.1  gives  a  detailed  development  of  rjss  assuming  is 
known.  Appendix  6.2  notes  the  changes  in  the  development  arising  when 
the  estimated  sequence  is  used,  and  obtains  the  result 


1~ Pi|o)9,o'+Pno9,r 

+  cE/if 

Oo|iPo'  +  (l-Pi|o)y  l' 

2 

(l-e)E/, 

((l-pi|o)p^+pnop12 

J  +  eE/i 

(oo  lffo +(l-pou)g'f 

(4-11) 


where  pno  is  the  probability  of  using  nonlinearity  g}  when  the  true  noise 
density  is  f  0,  and  pou  is  the  probability  of  using  nonlinearity  g0  when  the 
true  noise  density  is  f  v  When  both  error  probabilities  are  identically 
zero,  then  (4.11)  reduces  to  the  special  case  (4.8)  of  operation  without 
switching  error. 


Paralleling  the  discussion  of  the  previous  subsection,  we  shall  con¬ 
tinue  to  use  the  Gaussian-Gaussian  switched  burst  model.  Then  /0  and 
/ 1  are  Gaussian  densities,  and  g q  and  g  j  again  are  linear  detectors  with 
slopes  Oq  2  and  2,  respectively.  The  situation  may  be  generalized 
slightly  by  assuming  that  g  1  has  slope  of  not  necessarily  equal  to  a2.  No 
additional  generality  in  the  efficacy  calculation  results  by  allowing  the 
slope  of  g o  to  vary  from  cr0-2,  as  only  the  ratio  of  slopes  affects  perfor¬ 
mance.  Under  these  assumptions,  (4.9)  generalizes  to 


7}  SB 


d-0 

1— pi|0  ,  Pi'0 

+c 

po\i  .  1— Po;i 

o'  o 

Oq  °\  \ 

°o  of  j 

1— Pi  to 

,  PiloCT2 

+  £ 

Poiiffi2 

,  (l-pou)af 

<7o 

of  J 

Oq 

air  j 

(4.12) 


For  the  purposes  of  comparison,  SBe  will  denote  the  the  switched  burst 
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detector  with  error,  and  SB ^  will  denote  the  ideal  error-free  switched 

burst  detector.  The  noise  environment  will  be  the  Gaussian-Gaussian 
switched  burst  environment,  with  a  =  0.1  chosen  as  a  value  giving 
representative  numerical  results. 

To  begin,  the  effect  of  the  errors  pou  and  puo  will  be  examined.  Fig. 
4.6  presents  a  plot  of  ARESB  SB.  for  a  range  of  values  of  pou,  with  puo  fixed 

fl  l 

Ap  o 

at  zero,  and  crj  =  of.  Here,  the  effect  of  incorrectly  choosing  to  use  non¬ 
linearity  g o  during  the  impulsive  mode  is  isolated.  Two  conclusions  are 
obvious:  first,  performance  deteriorates  monotonic  ally  with  increasing 
error  probability  poii-  Second,  the  effect  of  not  recognizing  an  impulsive 
noise  observation  is  much  worse  as  the  variance  ratio  increases.  How¬ 
ever,  it  is  reasonable  to  assume  that  as  the  variance  ratio  increases,  it  is 
easier  for  an  algorithm  to  recognize  noise  bursts,  and  therefore  p0 u  will 
be  small. 

Figure  4.7  demonstrates  the  effects  of  deciding  incorrectly  that  a 
noise  burst  is  present.  Here,  puo  is  allowed  to  vary  while  pou  =  0,  and 
of  =  aj2.  Clearly,  making  this  type  of  error  is  far  less  damaging  to  perfor¬ 
mance  than  deciding  incorrectly  that  the  noise  is  in  background  mode. 

The  combined  effects  of  the  two  errors  may  be  seen  in  Fig.  4.8.  Here, 
pou  =  0.02,  and  pi|o  varies.  The  results  are  consistent  with  the  results  of 
Figs.  4.6  and  4.7:  the  performance  deterioration  is  due  mainly  to  pou,  with 
a  pi|o  providing  a  lesser  deterioration.  The  conclusion  which  may  be 
drawn  from  these  three  figures  is  that  correctly  recognizing  the  presence 
of  an  impulsive  burst  is  of  critical  importance  to  the  success  of  the 
switched  burst  detector.  Note  that  this  conclusion  gives  support  to  the 
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ARE 


Fig.  4.6  Performance  of  switched  detector  with  errors  SBe 
relative  to  ideal  switched  detector  SBi  for  various  probabili¬ 
ties  p0|i  of  incorrectly  classifying  an  impulsive  noise  sample 
as  background  noise. 
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Fig.  4.7.  Performance  of  switched  detector  with  errors  SBe 
relative  to  ideal  switched  detector  SB. ^  for  various  probabili¬ 
ties  pud  of  incorrectly  classifying  a  background  noise  sample 
as  impulsive. 


Fig.  4.8.  Performance  of  switched  detector  with  errors  SBe 
relative  to  ideal  switched  detector  SBi  for  various  probabili¬ 
ties  pou  of  incorrectly  classifying  a  background  noise  sample 
as  impulsive  with  fixed  probability  pou=.02  of  classifying  an 
impulsive  noise  sample  as  background  noise. 
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intuitive  notion  that  even  a  few  impulsive  noise  samples  can  seriously  dis¬ 
turb  detector  performance. 

The  effect  of  incorrectly  choosing  the  slope  of  ga  will  now  be  exam¬ 
ined.  Here,  po|i=pi|o  =  0,  and  of/ of  is  allowed  to  vary.  Fig.  4.9  gives 
AREse,^  for  small  values  of  the  variance  ratio,  and  Fig.  4.10  presents  the 
case  of  large  variance  ratios.  As  illustrated,  of/ erf  may  deviate 
significantly  from  unity,  with  only  moderate  effects  on  performance, 
especially  for  instances  where  of»of.  Note  that,  when  the  ratio 
approaches  infinity,  gss  essentially  "turns  off"  during  impulsive  bursts. 
Further,  as  o f  /  Oq  grows  large,  the  effect  of  incorrect  of  diminishes 
Surprisingly,  estimating  of  inaccurately  does  not  critically  affect  perfor¬ 
mance.  An  implication  of  Fig  4.9  and  4.10  is  that,  when  of  must  be 
estimated  from  the  noise  data,  good  asymptotic  detector  performance 
may  be  maintained  simply  by  biasing  the  estimate  towards  large  values. 

3.  Discrimination  between  Noise  Modes 

In  this  section,  an  algorithm  will  be  developed  to  regenerate  the 
switching  sequence  [e^.  It  was  previously  assumed  that  the  sequence 
[eil  did  not  switch  "too  rapidly".  The  assumption  may  be  interpreted 
here  as  meaning  that  the  probability  of  a  very  short  run  of  ones  in  [ei  $  is 
negligible. 

Parametric  Modeling 

There  are  a  number  of  ways  to  model  the  statistics  of  sequence  [e*]. 
Gilbert  [19]  proposes  a  Markov  chain  taking  on  one  of  the  two  state  values 
[Background,  Impulsive corresponding  to  the  proposed  states  zero  and 
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Fig-  4.9.  Performance  of  switched  detector  with  errors  SBe 
relative  to  ideal  switched  detector  SBi  for  various  errors 
of/of<l  of  impulsive  variance  estimate. 
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Fig.  4.10.  Performance  of  switched  detector  with  errors  SBe 
relative  to  ideal  switched  detector  SBi  for  various  errors 
df/af>l  of  impulsive  variance  estimate. 


-120- 


one,  respectively.  This  model  was  used  with  success  by  Ehrman  [20]  in 
simulating  an  impulsively  contaminated  Gaussian  channel.  During  a  run 
of  a  particular  state  in  fej,  each  observation  e*  may  be  considered  as  the 
outcome  of  a  Bernoulli  trial,  with  probability  of  success  equal  to  one  of 
the  two  state  transition  probabilities  p^  or  p0^v  As  is  well  known  [21], 
under  this  condition  it  follows  that  the  run  length,  or  residence  time,  of 
each  state  has  a  geometric  probability  density.  The  geometric  density 
itself  is  a  particular  case  of  the  negative  binomial  density.  If  the  transi¬ 
tion  probabilities  Pi-*o  and  joo-*i  are  different,  the  residence  time  density 
for  each  state  has  a  different  negative  binomial  density. 

Another  natural  model  for  the  run  length  statistics  of  {e{  J  is  the  Pois¬ 
son  density,  where  the  rate  parameter  of  the  density  is  the  mean  state 
residence  time.  The  rate  parameters  of  the  two  states  need  not  be  ident¬ 
ical.  If  the  rate  parameter  of  a  Poisson  density  is  not  known  exactly,  and 
instead  is  distributed  as  a  gamma  density,  then  the  compound  density  is 
the  negative  binomial  density  [21,  pp.  122-3];  if  the  rate  parameter  is 
exponentially  distributed  (a  special  case  of  the  gamma  density),  then  the 
compound  density  is  the  geometric  density.  As  a  result,  the  negative 
binomial  density  is  sometimes  referred  to  as  a  gamma-mixture  Poisson 
density. 

For  equal  means,  the  variance  of  the  geometric  density  is  greater 
than  the  variance  of  the  Poisson.  This  is  to  be  expected,  as  use  of  the 
Poisson  density  model  implies  perfect  knowledge  of  the  state  mean  run 
length,  while  the  geometric  model  implies  that  only  a  statistical  descrip¬ 
tion  of  the  state  mean  run  length  is  available. 
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Using  parametric  statistical  models  for  tests  may  be  devised 
which  observe  a  noise  sequence  and  generate  an  estimate  of 

the  switching  sequence.  Various  approaches  include  sample-by-sample 
tests,  pattern  recognition  approaches,  and  maximum  likelihood  sequence 
reconstruction.  With  an  accurate  model,  these  approaches  may  quite 
accurately  reconstruct  [ e t\.  The  difficulty,  however,  is  that  fairly 
detailed  information  about  the  statistics  of  [ei  \  may  be  needed,  and  often 
this  information  may  be  unavailable. 

Nonparametric  Approach 

The  advantage  to  a  nonparametric  approach  is  that  detailed  statisti¬ 
cal  information  is  not  necessary  to  construct  a  test,  and  that  non¬ 
parametric  tests  are  usually  fairly  robust:  they  work  reasonably  well  over 
a  broad  range  of  situations.  Furthermore,  they  often  have  simple  struc¬ 
tures.  On  the  other  hand,  they  are  generally  less  efficient  than  optimal 
decision  structures;  i.e,  given  the  same  amount  of  data,  there  is  a  higher 
probability  of  error. 

The  following  two-step  algorithm  is  proposed  to  generate  a 

reconstruction  of  fe^: 

(i)  On  a  sample-by-sample  basis,  decide  between 

Tij  ~  /  0  say  fa  =  0  (4.13) 

n-i  ~  / 1  say  =  1 

( H )  Filter  the  sequence  to  obtain  [pi  Here,  filter 

means  to  smooth  in  a  manner  which  tends  to  reduce  the 
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number  of  incorrect  state  transitions. 


If  /o  and  f  j  are  known,  then  (4.13)  may  be  carried  out  by  a  likelihood 
ratio  test.  In  the  case  where  f  q  and  /j  are  both  Gaussian,  and  the 

test  becomes 


Kl  >  T 

Pi=0 


(4.14) 


To  filter  \j>i\,  perform  the  test 


(4.15) 


for  some  integer  m  >0  If  m  =  0,  no  smoothing  occurs,  and  Pi=Pi  for 
every  i.  If  m  >  0,  the  test  (4. 14)  is  a  voting  algorithm,  where  the  outcome 
is  the  majority  state  in  a  window  of  length  2m +  1,  centered  about  p>i 
Since  pt  e  £0,1$,  the  smoothing  algorithm  is  a  special  case  of  the  median 
filtering  algorithm;  properties  of  median  filtering  have  been  studied 
recently  by  Gallagher  and  Wise  [23].  It  will  tend  to  preserve  transitions 
mto  a  new  state  with  run  length  greater  than  m  + 1  (edge-preserving  pro¬ 
perty),  while  tending  to  suppress  runs  with  length  less  than  m  (impulse 
filtering  property). 

N onparametric  Algorithm  Analysis 

To  calculate  the  error  statistics  of  the  smoothed  sequence  it  is 
first  necessary  to  calculate  the  performance  of  the  sample-by-sample 
test  (4.14).  As  it  is  a  binary  hypothesis  test  on  individual  observations 
elements  of  two  errors  are  possible: 
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Pno  =  Prob(  say pt  =  1  |  et  =  0)  =  2[l-F0(7’)]  (4.16) 

pou  =  Prob(  say  Pi  =  0  |  et  =  l)  =  2F1(7’)-1  (4.17) 

where  F  is  the  cumulative  distribution,  and  the  densities  are  symmetric 
about  zero.  The  correct  decision  probabilities  are  given  by 

pm  =  1— pou  (4.18) 


poio  —  1— pi|o 

Note  that  the  error  probabilities  are  a  function  only  of  the  value  of  el. 

Calculating  the  performance  of  the  filtered  sequence  is  more  com¬ 
plex.  For  convenience,  we  first  define  the  vectors 

—  (ei—  m<  ■  ■  •  <ei+m) 

and 


Pi  —  (Pi  —  m.’  ■  •  •  iPi+m ) 
and  the  length  2m  +  1  vectors 


0=  (0 - 0) 

1=  (1 . 1) 


If  ei  =  0,  each  outcome  in  pt  is  the  result  of  a  Bernoulli  trial  with  a  con¬ 
stant  probability  of  success  on  each  trial.  A  particular  element  in  the 
filtered  outcome  bii  will  be  in  error  only  if  at  least  m  +  1  elements  in  p* 
are  unity.  The  error  probability  is  given  by  the  cumulative  binomial  dis¬ 
tribution  with  constant  probabilities  in  each  trial. 


2m  +  1 

puo(i)  = 

Jb=m  +  1 


2m  + 1 
k 


(4.19) 
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Similarly,  for  ex  =  1, 


pou(i) 


2m  +  1 

E 


k  =m  + 1 


2771  + 1 

k 


G3oiOfc(pi|i)2m_fc+1 


(4.20) 


The  two  remaining  cases  are  for  nonhomogeneous  e*.  This  situation 
occurs  when  the  2m +  1  filter  window  contains  a  state  transition  of  {ei\; 
for  example,  et  =  (0 . 0,1,  ....  1). 

For  the  first  remaining  case,  suppose  et  =  1,  and  let  mc+m1  =  2m  +  l, 
where  m0  is  the  number  of  zeroes  in  e*.  and  m,  the  number  of  ones. 
Assuming  that  the  state  run  lengths  are  greater  than  2m +  1,  a  state  tran¬ 
sition  in  e*  means  there  are  m0  zeroes  followed  by  m j  ones,  or  vice  versa. 
The  noise  observations  are  independent  and  each  Pi  is  an  indepen¬ 
dent  Bernoulli  trial  outcome.  However,  e*  is  not  homogeneous,  and  the 
probabilities  of  the  outcomes  of  px  vary;  therefore,  px  is  distributed  as 
the  outcome  of  a  binomial  experiment  of  2m +  1  Bernoulli  trials  with  vari- 

m 

able  probabilities  of  success  [22,  p.  282].  The  statistic  J]  pi+le  in  the 

k-—m 

test  (4.15)  may  be  thought  of  as  being  the  sumpo+pj,  where  p0  is  the 
binomially  distributed  outcome  of  m0  Bernoulli  trials  with  constant  pro¬ 
babilities,  and  p-i  the  outcome  of  ml  trials.  The  probability  of  making  an 
error,  given  that  ex  =  1,  is 


2m 


pou(i)  £ 

j=m  + 1  k0+k:=j 


m0 

Icq 


(poio)fc°(p,|o) 


mQ-k0 


m  ^ 


(poll)  1(pi|i)r 


(4.21) 


where  0< fc0- mo  and  0<  A: 3 < m  j.  The  summation  indices  k0  and  k1  may 
be  interpreted  as  the  number  of  times  in  p*  thatpi+fc  =0  given  ei+k  =0, 
and  the  number  of  times  j5i+A.  =  1  given  ei+k  =  1,  respectively,  with 
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m<k<>m  If  m0-  0,  then  (4.21)  specializes  to  (4.20),  the  probability  of 
deciding  pt=  0  when  el  =  1. 

Treatment  of  the  second  remaining  case  of  nonhomogeneous  is 
similar.  Given  et  =0,  the  probability  of  deciding  p*  =  1  is  given  by 


2m  4-1 

Pno(i)  =2  S 

j  =m+i  ^0+^!=; 


T?T  i 


mi—A:, 


(4.22) 


where  the  indices  fc0  and  fcj  may  be  interpreted  here  as  the  number  of 
times  in  p*  that  Pi+k  =  1  given  ei+k  —  0,  and  the  number  of  times  j5t+fc  =  0 
given  ei+k  =  1,  respectively,  for  -m<k<m.  If  p0!,  =p,|0,  then  (4  21)  and 
(4.22)  are  symmetric  in  mQ  and  m,. 

While  the  error  probability  p  for  elements  of  the  unsmoothed 
sequence  ^  is  a  function  only  of  eit  after  smoothing  the  sequence  the 
error  probability  p(i)  is  a  function  of  the  subsequence  e*  The  perfor¬ 
mance  analysis  of  Appendix  4.2  requires  time  invariant  error  probabili¬ 
ties.  If  the  statistics  of  the  state  run  lengths  of  \  are  known,  then  the 

expectation  of  pou(i)  and  piio('i)  may  be  taken  and  used  in  evaluating 
(4.12). 

The  choice  of  filter  length  2771  +  1  affects  the  error  performance. 
There  are  two  competing  considerations:  on  the  one  hand,  it  is  desirable 
to  make  2771  +  1  as  large  as  possible,  for  the  error  probabilities  decrease 
with  increasing  filter  length,  provided  a  state  transition  does  not  occur 
within  e^.  On  the  other  hand,  making  2tti  +  1  small  reduces  the  probabil¬ 
ity  of  making  errors  in  the  vicinity  of  a  state  transition. 

The  following  argument  will  assist  in  choosing  2771  +  1:  For  the  test 
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\ 


(4.15)  to  recognize  a  state  transition,  at  least  m  +  1  elements  of  p*  must 
take  on  the  value  of  the  new  state.  If  the  last  m +2  elements  of  e*  belong 
to  the  new  state,  on  the  average  the  last  (m+2)px|x  elements  of  p*  will 
take  on  the  new  state  value,  and  conversely  if  the  first  m  +2  elements  of 
ej  take  the  value  of  the  old  state.  It  is  reasonable  to  choose  m  so  that 
(m+2)px|x>m  +  l,  ensuring  that  pii2  contains  on  the  average  at  least 
m  +  1  correct  state  decisions,  given  that  a  state  transition  occurs  between 
e*  and  ei±1. 

A  simple  manipulation  shows  this  condition  is  equivalent  to 


2m  +  l<  mm 

*e|0,l(  1— p*|* 


(4.23) 


The  minimum  nontrivial  filter  length  is  2m  +  l=3,  which  leads  to  the 

2 

requirement  that  min  px  ix  s* — 
xejO.lj  3 


Performance  of  the  Non  parametric  Algorithm 

As  was  shown  in  the  last  subsection,  three  parameters  determine  the 
error  performance  of  the  sequence  estimation  algorithm:  the  sample-by¬ 
sample  decision  errors  p0 u  and  pn 0,  and  the  filter  length  2m +  1.  The 
effect  of  these  parameters  is  examined  in  the  following  set  of  figures. 

The  performance  of  the  threshold  test  (4.14)  is  a  function  of  the 
threshold  T,  and  the  densities  /0  and  f  v  In  particular,  when  the  two 
densities  are  zero  mean  Gaussian  densities  with  variance  ratio  of/ a q,  the 
decision  probabilities  become 


piio  =  2  $(  — 7Y  cr0) 


(4.24) 
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and 

pm  =  2$(-7Yct1)  (4.25) 

where  $  is  the  cumulative  distribution  function  of  the  unit  variance  Gaus¬ 
sian  density.  It  is  natural  to  present  the  performance  of  the  test  (4.14) 
via  a  set  of  receiver  operating  curves,  shown  in  Fig.  4.11.  As  is  intuitively 
obvious,  and  clear  from  the  figure,  the  probability  of  recognizing  an 
impulsive  sample  increases  as  the  distinction  between  background  and 
impulsive  variances  increases. 

The  next  three  figures  consider  the  effects  of  p0;i  and  pm  upon  the 
performance  of  the  filtered  sequence  Here,  the  median  filter  has  a 

fixed  window  length  2m +  1,  and  the  values  of  pon  =  pt|o  are  allowed  to  vary 
The  left  side  of  the  plot  represents  situations  where  e*  =  1,  and  the  major¬ 
ity  of  the  states  in  the  observation  window  are  ones.  Thus, 
m  +  l;<m1<2m  +  l.  The  right  side  of  the  plot  represents  situations  where 
=  0,  and  the  majority  of  states  in  the  observation  window  are  zeroes. 
Thus,  m  +  l<m0<2m  +  l  The  implicit  assumption  in  the  performance 
plots  is  that  the  state  run  lengths  are  always  greater  than  2m +  1.  If  a 
state,  run  length  were  less  than  m  +  1,  the  impulse  filtering  property  of 
the  median  filter  would  tend  to  suppress  recognition  of  such  a  short  state 
run.  As  a  result,  when  the  state  run  lengths  grow  small  relative  to  m  +  1, 
the  error  probabilities  asymptotically  approach  unity. 

Fig.  4.12  examines  the  effect  of  various  values  pou  =  pno  with  the  filter 
length  fixed  at  2m  +  l  =  9.  Error  probabilities  of  the  smoothed  sequence 
increase  monotOnically  with  increasing  probability  that  Pi  is  in  error 
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Flg.  4.11.  Operating  characteristic  for  the  threshold  test  giv¬ 
ing  Pi  for  Gaussian-Gaussian  switched  burst  noise.  Note  that 
performance  is  not  a  function  of  c. 
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Flg.  4.12.  Error  performance  of  smoothed  sequence 
evaluated  for  various  threshold  test  error  probabilities  and 
pou  =  pi  io  when  e*  contains  a  state  transition  Filter  length  is 
2m  + 1  =  9. 


Fig.  4.13.  Error  performance  of  smoothed  sequence  £pj$ 
evaluated  for  various  threshold  error  probabilities  when  e* 
contains  a  state  transition.  Here,  the  effect  pou^Pno  may  be 
seen.  Filter  length  is  2m +  1  =  9. 
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Notice  that  when  ei  is  near  a  state  transition,  +  and  correct 

reconstruction  of  a  state  value  becomes  orders  of  magnitudes  more 
difficult  than  when  et  is  not  near  a  transition  and  +  When 

no  state  transitions  occur  within  the  filter  window,  e*  is  either  0  or  1. 
This  condition  will  be  denoted  as  steady  state,  and  the  probability  that  Pi 
is  incorrectly  classified  is  at  a  minimum.  The  steady  state  error  probabil¬ 
ities  are  the  quantities 


puo 


77tQ=2m.  +  l 


(4  26) 


and 


P0|1 


m1=2m  +  l 


(4.27) 


The  effect  of  varying  pou  while  keeping  pi|o  fixed  is  examined  in  Fig. 

4.13.  By  symmetry,  conclusions  from  this  case  may  be  applied  to  the 
complementary  situation.  Unexpectedly,  increasing  pou  decreases  puo. 
This  effect  is  operative  only  when  is  near  a  state  transition,  and  the 
effect  diminishes  as  e*  moves  away  from  the  state  transition.  For  exam¬ 
ple,  assume  ^=0  Then  errors  in  jo*  after  the  0-»l  or  prior  to  the  l-»0 
transition  in  contribute  favorably  to  the  filter  test  statistic  when  e*  =  0. 
As  the  state  transition  propagates  through  e*,  there  are  fewer  opportuni¬ 
ties  for  to  incorrectLy  take  on  state  value  zero  within  the  filter  window. 
Thus,  larger  values  of  pou  tend  to  diminish  the  probability  of  failing  to 
recognize  that  ei  -  0. 

The  effect  of  changing  the  filter  window  length  is  examined  in  Fig. 

4.14.  Here,  pou  =pi|o  =  .90,  and  2m  +  l  is  allowed  to  vary.  As  the  filter 
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Fig.  4.14.  Error  performance  of  smoothed  sequence  $ 
evaluated  for  different  filter  lengths  2m +  1  with  pou=pno=  .90 
when  e*  contains  a  state  transition. 
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length  grows,  it  becomes  relatively  more  difficult  near  a  state  transition 
in  [ei\  to  properly  classify  p*.  However,  this  is  compensated  by  the  fact 
that  the  steady  state  error  probabilities  diminish  rapidly  with  increasing 
filter  length.  Note  that,  when  e*  is  in  steady  state,  the  error  probabilities 
of  again  are  functions  only  of  et. 

4.  Simulation 

The  algorithm  developed  in  this  chapter  was  applied  using  the 
selected  high  kurtosis  Arctic  under-ice  noise  data  to  simulate  a  noise 
source  This  data  was  described  in  Appendix  2.1  of  Chapter  2,  and  used  in 
the  simulations  of  the  previous  chapter.  As  before,  Lhe  mean  of  each  of 
the  58  selected  blocks  was  adjusted  to  zero. 

To  carry  out  the  test  (4.14)  and  form  the  sequence  the  test 

threshold  was  chosen  as  1.282  a,  where  o2  is  the  variance  of  each  block, 
calculated  as 


02  = 


1  1024  „ 


The  value  T  =  1.282  a  corresponds  to  an  error  rate  of  pno=  .20  if  the  noise 
distribution  is  indeed  Gaussian.  If  the  noise  is  a  background  Gaussian 
noise  with  a  high  variance  Gaussian  impulsive  contaminant,  then  o2 
overestimates  the  background  variance,  and  pi;o<.2.  Similarly,  o2 
underestimates  the  variance  of  the  impulsive  component.  In  typical 
situations,  the  impulsive  component  is  present  for  only  a  small  propor¬ 
tion  of  the  time,  and  the  variance  ratio  of/of»l.  Therefore,  while  a 
threshold  of  1.282a  would  correspond  to  an  error  rate  of  pou  =  .80  if  the 
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noise  samples  belonged  exclusively  to  the  impulsive  mode,  it  is  far  more 
likely  that  .80  »pou  under  the  stated  conditions 

With  the  threshold  set  at  T  —  1.282  o  the  sequence  was  formed 
Various  window  lengths  were  used  to  smooth  Ipil,  and  the  best  overall 
estimated  value  of  ^  (described  later)  was  obtained  with  a  window 

length  2m  +  1  =7  Fig  4  15  presents  a  representative  block  of  noise  data 
and  the  corresponding  subsequence  of 

The  non-Gaussian  nature  of  the  noise  distribution  is  demonstrated  in 
Fig.  4.16,  a  Q-Q  plot  of  the  empirical  distribution  of  a  sample  noise  data 
block  versus  the  unit  variance  Gaussian  distribution.  In  this  plot,  a  Gaus¬ 
sian  sample  distribution  would  appear  as  a  straight  line.  For  noise  sam¬ 
ples  near  the  mean,  the  plot  is  approximately  linear.  For  large  samples, 
the  empirical  noise  distribution  has  a  spread  greater  than  that  of  the 
Gaussian  distribution.  Thus,  it  may  be  concluded  that  the  noise  sample  is 
heavier-tailed  than  a  Gaussian  density 

Since  the  smoothed  switching  sequence  estimates  and 
classifies  each  noise  sample  as  either  a  background  or  impulsive  noise 
process  observation,  the  noise  samples  may  be  segregated,  and  the  vari¬ 
ances  oq  and  of  may  be  estimated.  Using  these  estimates  and  the 
sequence  each  noise  sample  in  the  observation  block  may  be  nor¬ 
malized  to  unit  variance.  Fig.  4.17  presents  the  data  of  Fig.  4.15  after 
this  adjustment.  Distinct  spikes  no  longer  appear  in  the  plot,  save  for  a 
single  spike  near  sample  800,  where  \  may  be  in  error.  The  Q-Q  plot  of 
the  normalized  data  is  shown  in  Fig  4.18.  The  resulting  plot  is  more 
nearly  a  straight  line,  indicating  that  the  normalized  data  is  now 
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Fig.  4.15.  Comparison  of  sample  Arctic  under-ice  data 
record  2220  ana  corresponding  subsequence  of  Vertical 
scale  of  the  noise  is  in  standard  deviations  from  the  mean.  A 
threshold  of  T  =  1.2825  and  filter  length  2m +  1  =  7  were  used. 
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Fig.  4.16.  Q-Q  plot  showing  sample  quantiles  of  sample  Arctic 
under-ice  data  record  2220  prior  to  processing  versus  the 
quantiles  of  a  zero  mean  unit  variance  Gaussian  distribution. 
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Fig.  4.17.  Normalized  sample  Arctic  under-ice  noise  data 
record  2220  after  and  d?/do  are  used  to  adjust  the  data. 
Vertical  scale  is  in  standard  deviations  from  the  mean. 
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Fig.  4.18.  Q-Q  plot  showing  sample  quantiles  of  normalized 
Arctic  under-ice  noise  data  record  2220  after  and  of/oo 
are  used  to  adjust  the  data  versus  the  quantiles  of  a  zero 
mean  unit  variance  Gaussian  distribution. 
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Gaussian.  Therefore,  we  may  conclude  that  the  algorithm  provided  an 
effective  means  of  distinguishing  between  the  background  and  impulsive 
noise  samples. 

For  the  particular  data  block  of  Figs  4.15  -  4.17,  approximately  5.6% 
of  the  samples  are  classified  as  impulsive  noise,  and  the  variance  ratio  is 
estimated  as  af/3o  =  31.3.  These  estimates  may  be  used  in  (4.9)  to  esti¬ 
mate  TjsB  by  simply  setting  of  =  1,  and  letting  crf  =  of/of.  In  this  case, 
ARE5Bi£d  =2.55  is  the  estimated  performance  improvement. 

Using  the  switched  burst  detector  algorithm,  all  58  of  the  high  kur- 
tosis  data  blocks  may  be  analyzed,  allowing  e,  and  of/ of  to  be  estimated 
for  each  data  block.  Fig.  4.19  gives  the  estimate  of  e,  and  the  estimate 
of  the  variance  ratio  is  given  in  Fig.  4.20  Fig  4.21  presents  the  values  of 
ARESBld  for  each  data  block  derived  by  substituting  these  estimates  into 
(4.9). 

Over  the  58  data  blocks  the  cumulative  average  parameters  were 
computed,  giving  e  =  .089,  and  of/ of  =9.03.  These  parameter  values  lead 
to  ARE SBiU  =  1.58  as  an  estimate  of  the  processing  gain. 

The  switched  burst  detector  may  be  compared  to  the  adaptive  detec¬ 
tors  of  the  previous  chapter.  Fig.  4.22  shows  ARESflJd  plotted  with 
AREp^  and  ARE^  Here,  it  is  clear  that  the  switched  detector  out¬ 
performs  the  non-switching  adaptive  detectors.  This  result  is  not  unex¬ 
pected,  for  the  results  of  this  chapter  indicate  that  gSB  outperforms  g  E  in 
Gaussian-Gaussian  switched  burst  noise,  and  the  last  chapter  indicates 
that  g£  slightly  outperforms  both  gtm  and  gtm  in  Gaussian-Gaussian  c- 
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Fig.  4.21.  Estimated  performance  of  switched  burst  detector 
SB  relative  to  a  linear  detector  Id.  for  each  sample  noise  data 
block. 


ARE 


the  estimated  performance  of  nonlinearities  gtm  and  g^i  rela¬ 
tive  to  Id  (solid  lines)  for  each  sample  noise  data  block. 
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mixture  noise  As  noted  earlier,  the  expression  for  the  efficacy  of  a  fixed 
detector  in  Gaussian-Gaussian  e-mixture  noise  is  identical  to  the  expres¬ 
sion  for  efficacy  in  Gaussian-Gaussian  switched  burst  noise  when 
9o  =  9i~9<  where  g  is  some  fixed  arbitrary  detector  It  follows  then,  that 
gsB  will  outperform  any  fixed  detector  nonlinearity  if  the  Arctic  under-ice 
noise  is  indeed  a  Gaussian-Gaussian  switched  burst  noise. 

5.  Conclusion 

We  have  presented  an  argument  in  favor  of  a  Lime-varying  nonlinear¬ 
ity  for  use  in  a  LO  detector  structure  when  the  signal  is  embedded  in  a 
type  of  impulsive  noise  that  classified  here  as  a  switched  burst  noise.  The 
nonstationarity  part  of  the  structure  is  easy  to  implement:  it  merely 
requires  switching  the  observations  between  two  fixed  nonlinearities. 
Analysis  of  the  algorithm  indicates  that  this  detector  is  capable  of 
improved  performance  over  a  fixed  structure.  A  simple  technique  for 
determining  the  presence  of  noise  bursts  has  also  been  proposed,  and  its 
performance  was  examined. 

It  may  be  argued  that  the  additional  complexity  of  two  nonlinearities 
and  a  structure  to  estimate  the  switching  sequence  [el]  is  not  warranted 
by  the  relatively  modest  improvement  over  the  fixed  nonlinearity  g£ 
Several  points  are  in  order:  First,  a  complex  nonlinearity  is  replaced  by 
two  linear  amplifiers  and  a  switch.  Second,  the  exact  shape  of  g£  is  a 
function  of  the  impulsive  proportion  e  and  the  variance  ratio  af/ofi.  In 
the  proposed  algorithm,  would  be  determined  by  observation  of  the 
noise  behavior,  and  the  exact  value  of  e  is  of  no  importance  to  the 
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switched  detector.  The  ratio  of  impulsive  to  background  variance  is 
important,  however,  since  it  will  determine  the  gain  ratio  of  the  two 
linear  amplifiers.  This  is  easy  to  calculate,  since  [ej  separates  the  noise 
observations  into  a  stream  of  observations  from  the  impulsive  noise  pro¬ 
cess  and  a  stream  of  observations  from  the  background  noise  process.  As 
a  result,  the  two  variances  may  be  calcalated  in  a  straigtforward  and 
appealingly  natural  manner. 

An  assumption  made  in  the  example  was  that  the  impulsive  com¬ 
ponent  could  be  adequately  modeled  by  a  high  variance  Gaussian  density. 
It  might  be  desirable  to  use  some  other  heavy  tailed  noise  to  model  the 
impulsive  component,  for  instance,  a  Laplace  density.  This  has  some 
intuitive  appeal:  It  may  be  assumed  that  the  impulsive  component  itself 
may  be  modeled  with  an  additive  mixture  density  in  a  fashion  similar  to 
Huber  [18].  Then,  as  the  contamination  parameter  approaches  unity,  the 
mixture  density  approaches  the  Laplace  density,  whose  LO  nonlinearity  is 
the  sign  detector.  Thus,  g0  would  be  a  linear  detector,  and  g j  a  sign 
detector.  Alternatively,  g\  might  be  chosen  to  be  an  amplifier-limiter,  a 
noise  blanker,  or  some  other  nonlinearity  that  gives  the  test  statistic  a 
degree  of  robustness  against  impulsive  noise  bursts. 

One  interpretation  of  the  proposed  structure  is  that  switching 
between  two  linear  detectors  is  not  necessary.  Instead,  the  proposed 
structure  could  be  regarded  as  a  linear  processor  with  some  sort  of 
automatic  gain  control,  which  can  quickly  and  accurately  adjust  an 
amplifier  gain  and  hold  the  noise  variance  constant.  A  linear  detector 
with  continuously  adjustable  gain  is  equivalent  to  the  limiting  case  M->°° 
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of  a  switched  burst  detector  where  the  detector  switches  between  M 
linear  amplifiers. 
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Appendix  4. 1 


In  this  appendix,  an  expression  is  developed  for  the  efficacy  of  a 
detector  that  switches  between  two  nonimearities  g0  and  gx  in  accor¬ 
dance  with  a  control  sequence  ei.  For  the  sake  of  compactness,  E#0 


71 


denotes  the  n-fold  expectation  with  respect  to  the  density  and 


i=i 


E f0  denotes  univariate  expectation  with  respect  to  /0 

The  efficacy  of  a  detector  using  test  statistic  Tn  is  defined  by  [5,6,9] 
as 


2 


(A4.1) 


A  regularity  condition  causes  the  signal  s  to  vanish  asymptotically,  ensur¬ 
ing  that  the  probability  of  detection  does  not  converge  to  unity  as  n 
grows  without  bound  [5,9].  Another  interpretation  is  that  (A4.1)  is  an 
incremental  signal-to-noise  ratio  [6,7],  and  the  regularity  conditions 
guarantee  that  as  n-»«>,  the  incremental  SNR  remains  finite. 

The  test  statistic  for  the  switched  burst  detector  can  be  written  as 


Tn  = 


(A4.2) 


1  =  1 


=  ZtU-eiWzi)  +  excite)] 


i 


since  ei  takes  on  only  the  values  of  zero  or  unity.  Then 


E  Hjn  =  E/tU-OsoOti+s)  +  Si^Xi+s)] 


(A4.3) 
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*  [(l-eJ/otei)  +  ei/i(*i)]d®i 


We  use  the  fact  that  (l-et)2=  (l-et)  and  ei2=ei  and  that  ei(l-ei)  =  0  to 
obtain 


Kujn  =  S(1_e»)E/o?o(2i+s)  (A4.4) 

i 


+  IXE/iffife+s) 

i 

Finally,  making  the  usual  assumption  that  the  order  of  expectation  and 
differentiation  may  be  interchanged,  we  have 

T~e  ujn  =  E/oPo'E(1-«<)  +  E/,0i'I>t  (A4.5) 

os  i-l  i- l 


Without  loss  of  generality,  we  will  assume  g0  has  zero  mean  under  /0, 
and  <7j  has  zero  mean  under  f  v  Then  varWorn  =  E^T2.  Here, 

X  ^2 


=  E«, 


E[(i-e,)so  +  eiSi] 

i  =  l 


The  summands  in  (A4.6)  may  be  rearranged  to  obtain 

=  E».2(<l-«,)»f  (*,)  +  «i  Sf(x() 

+  E//0E2[[(l-ei)so(*i)  +  eiffi(xi)] 

**y  1 


(A4.6) 


(A4.7) 


The  sequence  is  independent,  and  g§  and  g  j  are  memoryless 

transformations;  therefore  the  second  expectation  on  the  right  side  of 
(A4.7)  equals  zero.  Thus, 
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*hJ£  =  E/fJEU-.)  +  E/,ff  ?£«, 

i  i 


Substituting  (A4.5)  and  (A4.7)  into  (A4. 1),  we  find 


77m  =  lim 

*n  n-*®° 


£(l-et)E/og0'  +  SeiE/^i' 


12 


n 


2(l-ei)E/og|  +  SejE/jflrf 


(A4.8) 


(A4.9) 


Alter  multiplying  through  by  n-2  and  taking  the  limit,  we  have 

_  j(l-e)E/op0'  +  cE/^j']2 
77  T*  (l-e)E/og|  +  eE/,012 


(A4.10) 
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Appendix  4.2 

The  previous  appendix  formulated  the  efficacy  for  the  switched  burst 
detector  assuming  perfect  knowledge  of  the  switching  sequence  fe^.  In 
this  appendix  the  result  (A4.10)  is  extended  to  account  for  errors. 

Suppose  e*  is  the  true  sequence,  but  errors  are  made  randomly  in 
choosing  between  detectors  g0  and  g j.  Letp*  efO,  1$  represent  the  deci¬ 
sion  at  observation  time  i  to  choose  g0  or  glt  respectively.  In  the  ideal 
case,  Pi  =  e*  for  — <  i  <  To  model  the  effect  of  errors,  let 

Prob(pi  =  1  |  =  0)  =  pno  (A4.ll) 

Prob(pi  =  0  |  el  =  1)  =  p0|i  (A4. 12) 

be  the  posterior  probabilities  of  determining  Pi  incorrectly,  where  the 
posterior  probabilities  of  correct  detection  are  given  by 

Poio  =  1~ Pno  (A4. 13) 

Pi|i  =  1— Po:i  (A4. 14) 

Clearly,  it  is  desirable  to  have  pi|t>  and  pou  as  near  zero  as  possible.  From 
the  point  of  view  of  the  detection  system,  [e^  is  a  deterministic 
sequence,  and  is  a  noisy  estimate  of  (ei|.  It  is  assumed  that  (A4. 11- 
A4. 14)  are  time  invariant. 

Rather  than  repeat  the  derivation  of  Appendix  4.1,  only  the 
significant  modifications  in  the  derivation  will  be  noted.  The  correct  deci¬ 
sion  sequence  in  (A4.2)  and  its  sequels  may  be  replace  by  the  cor¬ 
rupted  decision  sequence  [pij.  Thus,  the  test  statistic  Tn  becomes 

Tn  =  o(*i  )+Pi9i(xi) 


(A4.15) 
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The  expectation  of  jo*  may  be  taken  with  respect  to  its  posterior  distribu¬ 
tion: 

i 

if  et  =  1,  then  Eptjei  =  ]]  k  Prob (p*  =  k  |  =  l)  =  pm 

k=  o 

if  et  -  0,  then  Ep.  |6<  =  J]k  Prob(p*  -k  |  =  0)  =  p0|o 

k  =0 

Therefore 


Ej^l^Pi  —  StPi!i  +  (l  ei)pi!0 


(A4.17) 


Ej>{|et(l  Pi)  ®tPo|l  +  (l  ^i)po|0 


(A4.18) 


Applying  these  results,  the  expectation  of  Tn  with  respect  to  the  poste¬ 
rior  distribution  of  pz  may  be  written  as 

=  S(l-ei)[ooi05o(zl)+po,igi(iT)J  (A4.16) 


+ Ytei  j°°i  &  ofo  )■ +P»  i  >9 1  (*i ) 


Following  the  same  arguments  as  in  Appendix  4.1, 


_0_ 

ds 


Tn  -  E/o(ooiop0'+po!ip j'  £(1-ei)  (A4.19) 

1  Ji=l 


+E/1(ooiiPo'+Pn^i']S^ 

1  Ji=i 


To  compute  var h  Tn,  the  arguments  in  Appendix  4  1  are  paralleled, 
giving 


Eft  | et  Tn  -  2Ept  |et 


(^-Pi)9o(Xi)+Pi9?(Xi) 
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.  ,  w 

+  Li  LjEjh  ,Pj  |  e<  ( 1  ~Pi  )9o(xi)+Pi9i(xi)  (1  -Pj  )g  0  (*/ )  +Pj  9i(Xj) 


The  single  summation  becomes 

S(l-*<)h*f  ^Zi ^  +P1 '°9 l  (*i hi ) +et  jooi i^o  (zj )  +pn  i0  i  (*i ) 


Depending  on  the  reconstruction  algorithm,  fpiJ  may  or  may  not  be  an 
independent  sequence,  so  the  double  summation  term  cannot  be  dropped 
after  expectation  with  respect  top*  \ei.  However,  every  term  contains  a 
cross  product  g  (x^g  (x;  )  with  i  *  j .  Thus,  the  expectation  with  respect  to 
Hq  of  each  term  in  the  double  summation  is  zero,  for  [xx  J  is  an  indepen¬ 
dent  sequence,  and  gQ  and  g}  are  memoryless.  Therefore, 

EtfoEPl|e1  =  E/o(oomg|+pno0i2]2(l-ei)  (A4.20) 


+  E/jjooiipl+pmpfjSei 

I 

Following  a  similar  computation  as  (A4.9),  it  may  be  concluded  that 


PTn  - 


Oo\og  o'+Puo9 1' 

+eE/lj 

Po\igo'+pi\ig  i' 

'2 

jooioPo  +Pi\a9i 

+  cEfi 

joo|i0o+PiiiPf 

(A4.21) 


Equivalently,  noting  (A4.13)  and  (A4.14),  the  expression  for  efficacy  may 
be  rewritten  to  depend  only  on  the  error  probabilities 


[d-£)E/D 

(l-Pi|o)0o'+PiloSi' 

+eE/ij 

Po;iPo'+(l-Pno)Pi' 

’  2 

U-OE/„ 

(l-pno)Po  +Puogf 

+eE/i 

[ooi  i0o  +  (l-pou)Pi2 

(A4.22) 
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5 


Approximation  of 
Locally  Optimum 
Detector  N online arities^ 


An  interesting  problem  arising  in  detection  is  the  following,  given 
that  the  true  noise  density  /  and  the  true  detector  nonlinearity  gopt  are 
known,  what  is  the  best  way  to  approximate  gopt  within  some  specified 
constraints?  This  chapter  provides  one  possible  solution  to  this  broadly 
posed  question. 

Section  1  reviews  the  theoretical  background  of  the  problem,  and 
states  the  objective  more  precisely.  Section  2  presents  a  theorem  and 
proof  showing  the  equivalence  of  a  minimum  mean  square  error 
(minimum  MSE)  approximation  approach  and  an  efficacy  maximizing 
approach.  Section  3  provides  some  numerical  examples  as  illustration  of 

t  This  chapter  is  based  on  work  done  in  collaboration  with  K.S.  Vastola  of  Prince¬ 
ton  University;  a  different  version  of  this  chapter  appeared  as  a  coauthored  paper 
[15]. 
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the  theorem.  A  summary  of  the  chapter  is  presented  in  Section  4. 

1.  Introduction  and  Problem  Statement 

As  discussed  in  Chapter  2,  the  locally  optimal  (LO)  detector  structure 
is  useful  for  the  detection  of  a  signal  which  is  known,  but  very  small  rela¬ 
tive  to  the  noise  environment.  For  detecting  a  (constant)  weak  discrete- 
time  signal  in  the  presence  of  white  non-Gaussian  noise  with  first-order 
density  / ,  it  is  well  known  that  the  LO  detector  consists  of  a  memoryless 
nonlinearity  (ZNL)  of  the  form 

9lo(i)  =  -j-& f  <51> 

followed  by  summation  and  comparison  with  a  threshold. 

Obviously,  when  the  functional  form  of  /  is  known  explicitly,  it  is  pos¬ 
sible  to  calculate  the  exact  form  of  gL0.  However,  it  may  not  be  appropri¬ 
ate  to  implement  the  exact  function  gL0\  instead,  it  may  be  desirable  to 
implement  some  suboptimal  nonlinearity  g.  Possible  reasons  for  this 
may  be  that  g  is  in  some  sense  easier  to  implement  or  more  easily  adapt¬ 
able  to  changing  noise  environments.  For  instance,  g  may  be  a  ZNL  with 
a  simple  parameterization.  Other  considerations  may  be  that  the  best 
estimate  of  gL0  ( e.g .,  via  density  estimates)  is  too  rough  or  has  no  closed 
form  representation. 

When  dealing  with  weak  signal  detectors,  the  usual  measure  of  per¬ 
formance  is  efficacy  [1-4],  which  can  be  defined  by  the  following  equation 


(5.2) 
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where  Ef  is  the  expectation  with  respect  to  f .  Without  loss  of  generality, 
we  assume  E j{g)  -  0.  The  efficacy  (5.2)  can  also  be  thought  of  as  an 
incremental  signal-to-noise  ratio  or  as  the  processing  gain  achievable 
using  detector  nonlinearity  g  when  the  noise  has  density  / .  In  principle, 
the  problem  discussed  above  may  be  solved  by  maximizing  (5.2)  over  the 
family  of  possible  ZNL’s  which  we  choose  to  admit.  Unfortunately,  in 
practice  this  is  not  often  a  simple  thing  to  do,  and  an  alternative 
approach  is  sought. 

2.  Theorem  and  Discussion 

The  theorem  presented  below  yields  a  method  for  finding  the  best 
nonlinearity  over  a  class  of  suboptimum  nonlinearities.  Basically  the 
theorem  states  that  this  problem  is  equivalent  to  that  of  finding  the  non¬ 
linearity  which  is  closest  to  gi0  in  the  mean  square  sense.  Several 
related  results  have  been  obtained  in  recent  years.  For  the  specific  prob¬ 
lem  of  designing  detector  quantizers,  Kassam  [5]  and  Poor  and  Alexan- 
drou  [6]  have  shown  that  a  close  relationship  exists  between  maximum- 
efficacy  quantization  and  quantization  minimizing  the  mean  square  dis¬ 
tortion  relative  to  giQ.  Also,  in  the  more  general  setting  of  strong  mixing 
(dependent)  noise,  Halverson  and  Wise  [7]  have  shown  that  if  a  sequence 
of  nonlinearities  \gn }  converges  in  mean  square  to  gL0,  then  the  efficacies 
b?/(§n)i  converge  to  the  optimal  efficacy  T]f(gL0).  Note  that  "mean 
square",  as  used  in  this  context,  is  with  respect  to  the  measure  defined 
by  the  noise  distribution. 

Within  the  problem  setting  of  Section  1,  the  following  theorem  is  a 
generalization  of  the  results  in  [5]  and  [6]  discussed  above. 
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Theorem.  Given  a  noise  density  /  ,  its  LO  nonlinearity  gL0,  and 
a  family  G  of  candidate  suboptimum  nonlinearities,  the  solu¬ 
tion  g  *  e  G  to 

T\/{g')  =  max  T]f{g)  (5.3) 

JeG 

is  the  same  as  the  solution  g*  e  G  to 

E /  ( 9  *~9lo)Z  =  min  Ef  (g-gL0)z-  (5.4) 

jeC 

subject  to  a  simple  normalization  of  the  elements  in  G 


Proof.  Under  the  mild  conditions  of  the  Pitman-Noether 
Theorem  [1,3] 


V/ig)  = 


f 9Z(x)f  (z)c& 


(5.5) 


Our  problem  is:  Given  a  class  G  of  nonlinearities  and  a  density 
/ ,  find g'eG solving 


max^(s) 


(5.6) 


Since  the  efficacy  of  a  nonlinearity  g  is  invariant  under  a  scale 
change  ( i.e .,  7 if  (eg)  =T}f{g)  for  every  c  *  0),  we  can  multiply 
each  nonlinearity  g  by  the  constant 


/g&(*)/  (s)rfz 

f  g2(x)f  (x)dx 


a 

r  r 

sgn  J  g(x)f'(x)dx 


(5.7) 


This  allows  us  to  assume,  without  loss  of  generality,  that  for 
every  g  £  G 
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f  9Z(x)f  (x)dx  =  f  g£0(x)f  (x)dx  (5.8) 

and 

fg(x)f'(x)dx<,0  (5  9) 


Now  consider  the  MSE  problem 

min  f(g(x)-gLo(x))zf(x)dx  (5.10) 

We  have  straightforwardly  that 

f(g  {x)-gL0{x))zf  {x)±c 

=  fgZ(x)f(x)dx  (5.11) 

— 2  f  g  (x )gL0{x )f  (x  }dx  +  f  gl0{x)f  {x)dx 
From  (5.8)  and  (5.1)  the  MSE  becomes 


=  2 [fgZo(x)f  (x)dx  +  f  g(x)/'(x)dx] 


Because  of  (5.9),  we  see  that  minimizing  this  over  G  is 

f  r  12 

equivalent  to  maximizing  i J  g(x)f'(x)dx  .  By  (5.8)  the  quan¬ 
tity  fgz(x)/(x)dx  is  constant  over  G;  thus  we  have  the  con¬ 
clusion  that  minimizing  the  MSE  (5.10)  is  equivalent  to  maxim¬ 
izing  the  efficacy  functional  given  in  (5.5).  ■ 


Discussion 

Thus,  given  /  and  gL0,  as  well  as  G,  a  family  of  approximations,  the 
nonlinearity  g  *  which  maximizes  efficacy  over  the  family  G  is  simply  the 
minimum  mean  square  error  approximation  to  gL0  over  G.  Solving  the 
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minimum  MSE  problem  (5.4)  is  often  easier  than  solving  (5.3)  directly, 
especially  when  G  is  a  parameterized  family. 

For  the  purposes  of  the  proof,  each  element  in  G  was  multiplied  by 

the  constant  cg,  but  in  practice  it  is  not  always  necessary  to  precondition 

each  member  of  G.  If  one  were  trying  to  solve  the  MSE  problem  over  a 

parameterized  family  of  nonlinearities,  say,  G=[y(x;a)[,  with  a  an  m- 

vector  of  parameters,  the  simplest  approach  is  to  merely  treat  cg  as  an 

additional  parameter  controlling  the  scaling  of  g.  The  new  problem  then 

would  be  to  find  the  minimum  MSE  estimate  of  gio  in  G=  (cyp(x;a)}  where 
/ 

the  new  parameterization  is  the  (m  +  l)-vector  (cy,a).  If  an  explicit  ampli¬ 
tude  parameter  is  already  an  element  of  a,  then  this  modification  is 
unnecessary  and  (5.4)  may  be  solved  directly. 

The  theorem  provides  support  for  certain  intuitive  ideas  about 
suboptimal  detection.  Previous  work  with  suboptimal  structures  [8-14] 
suggests  that  near  optimal  efficacy  is  possible  if  the  suboptimal  structure 
g  appears  "close  to"  gig.  Further  refinements  making  g  "closer"  to  gig 
yield  only  minor  improvements  in  performance.  Since  efficacy  is  directly 
related  to  the  mean  square  error  between  g  and  gig,  it  is  easy  to  see  that 
small  errors  in  g  (relative  to  gL0 )  tend  to  be  deemphasized,  at  the 
expense  of  emphasizing  the  gross  errors.  Furthermore,  the  square 
errors  are  weighted  by  the  noise  density;  for  unimodal  densities,  points 
in  the  tail  region  are  weighted  much  less  heavily  than  those  near  the 
mode. 

These  points  illustrate  why  a  great  deal  of  latitude  is  available  to  the 
designer  in  choosing  the  tail  behavior  of  g,  while  the  shape  of  the  central 
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region  must  be  chosen  much  more  carefully.  In  particular,  for  heavy¬ 
tailed  noises,  reasonable  performance  levels  may  be  attained  by  carefully 
matching  the  shapes  of  g  and  gL0  near  the  noise  mean,  and  choosing 
more  roughly  the  limiting  or  blanking  behavior  of  the  tail  regions  [8-14] 
Also,  note  that  the  adaptive  nonlinearities  of  Chapter  2  typically  were 
good  matches  to  g^o  near  the  noise  mean,  but  only  loosely  approximated 
the  tails  of  gL0.  In  the  examples  given,  these  suboptimal  adaptive  non- 
linearities  achieved  high  levels  of  performance  ■with  respect  to  the 
optimal  nonlinearity.  Additionally,  in  Chapter  3,  the  only  nonlinearities 
that  were  substantially  suboptimal  were  cases  in  which  there  was  a  poor 
fit  near  the  origin 


3.  Examples 


Known  Density 

Since  maximizing  efficacy  is  the  same  as  solving  the  M3E  problem, 
the  best  approximation  in  G  is  the  projection  of  g^o  onto  G.  As  an  illus¬ 
tration  of  this  point,  suppose  G  is  the  span  of  a  finite  set  of  basis  func¬ 
tions  i 'fi,  with  i  -  1,  .  .  .  ,  N ,  where  the  tpi  are  orthonormal  with  respect  to 
/  .  An  approximation  g  will  take  the  form 

y 

9='Z<k‘P  i  (5.12) 

t=i 

where  the  are  not  all  zero  Solving  (5  3)  directly  requires  simultaneous 
solution  for  [a*]  in  an  //-dimensional  quadratic  form.  Solving  (5.4)  leads 

to  the  solution  a*  =  Ef  (, 9io9i )  for  i  =  1 . N. 

This  approach  is  probably  most  useful  in  an  analytical  context,  for 
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detailed  knowledge  of  /  is  necessary  to  generate  the  orthonormal  basis 
set.  If  /  is  not  available,  a  set  of  N  basis  functions  may  still  be  generated 
provided  2 N  moments  of  /  are  known  [17]. 


Unknown  Density 

In  this  example,  the  theorem  is  applied  to  smooth  an  estimate  of  gio. 
Using  a  finite  number  of  noise  observations,  the  kernel  density 

estimation  procedure  of  Parzen  and  Rosenblatt  [16,18,19]  is  used  to  give 
f  and  estimates  of  the  density  and  its  first  derivative  The  LO  non¬ 
linearity  may  then  be  estimated  as  gu)(x)  =  ~f'(x)/f  (x)  Unless  N  is 
very  large,  gL0  will  be  rough,  and  it  will  be  desirable  to  find  gL0,  a 
smoothed  version  of  the  estimated  nonlinearity.  By  the  theorem,  a 
smoothing  technique  based  on  a  minimum  MSE  criterion  would  yield  the 
best  performing  gio 


Consider  the  following  numerical  example,  where  the  [A*]  are  100  iid 
observations  of  a  zero  mean,  unit  variance  noise  process  with  Gaussian- 
Gaussian  e-mixture  density,  e  =  0. 1,  and  of/ =  100.  Using  the  finite 
width  polynomial  kernel  and  window  sizing  procedure  discussed  by  Silver- 
man  [16],  both  f  and  /'  were  estimated,  and  gio  was  computed.  Figures 
5.1  and  5.2  compare  f  to  the  true  density. 


To  smooth  gio,  it  was  projected  onto  G  = 


e— *2/2 
V2n 


*( 


l,z,x2,x3). 


In  a 


practical  problem,  /  is  unknown,  so  the  expectations  are  computed  with 
respect  to  the  empirical  cdf.  Solving  the  MSE  problem  (5.4)  requires  the 
simultaneous  solution  of  four  linear  equations.  The  result  is  a  smoothed 


estimate 


X 

Fig  5.1.  Estimated  density  f  (broken  line)  and  the  true 
Gaussian-Gaussian  e-mixture  density  /  (solid'line). 


X 


\ 


Fig.  5.2.  Comparison  of  f  (broken  line)  and  f  (solid  line)  on 
a  logarithmic  scale. 
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-z2/  2 

9lo(x)  =  (fo^+Pz^+P \X  +P)  ^y=~ 

Figure  5  3  compares  §iq  and  gio,  and  Figure  5.4  compares  g^Q  and  the 

true  LO  nonlinearity  gL0.  For  this  example,  ARE~  ..  =  8.58  and 

gw  •<■9 

AREjv  =  .951,  where  ARE  is  as  defined  in  Chapter  2. 

9w>9w  * 

In  this  example,  G  is  not  orthonormal  with  respect  to  the  noise  den¬ 
sity.  It  was  chosen  for  convenience  and  "nice''  smoothness  properties. 
This  example,  and  work  by  Modestino  [20],  suggest  that  elements  of  G 
could  be  various  generic  detector  nonlinearities,  where  the  coefficients  ft 
would  weight  the  contribution  of  each  nonlinearity.  Some  adaptive  pro¬ 
cedure  could  observe  the  noise  and  update  the  coefficients  & . 

4  Conclusion 

When  replacing  a  known  locally  optimal  nonlinearity  with  some 
suboptimal  nonlinearity,  it  is  desirable  to  have  a  method  which  is  simple 
and  generates  a  nonlinearity  which  preserves  a  high  performance  level. 
We  have  presented  a  proof  of  the  equivalence  of  efficacy  maximization 
and  mean  square  error  approximation.  MSE  minimizing  procedures  have 
many  appealing  properties,  and  they  have  a  rich  history  in  both  theory 
and  application.  Often  relatively  simple  algorithms  may  be  found  for  car¬ 
rying  out  the  calculations,  and  it  is  possible  that  these  methods  may  now 
be  applied  fruitfully  to  the  problem  of  designing  maximum  efficacy 
suboptimal  detector  nonlinearities. 

There  are  several  other  useful  interpretations  of  the  theorem.  The 
first  is  that,  since  the  MSE  performance  measure  involves  only  a  single 
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Flg.  5.3.  Comparison  of  the  estimated  nonlinearity  gio  (bro¬ 
ken  line)  and  the  smoothed  estimate  gio  (solid  line). 
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integral,  we  can  study  the  contribution  of  an  isolated  region  of  the  non¬ 
linearity  to  overall  mean  square  error,  and  therefore,  its  relative  contri¬ 
bution  to  performance  degradation.  As  an  example,  this  allows  us  to 
examine  the  sensitivity  of  performance  with  respect  to  changes  in  the 
nonlinearity  over  certain  regions  of  the  input  axis.  Often,  the  behavior  of 
a  nonlinearity’s  tail  region  is  of  particular  interest,  and  the  simple  rank¬ 
ing  of  performance  sensitivities  afforded  by  the  use  of  (5.10)  would  allow 
the  relative  merits  of  various  tail  configurations  to  be  studied  indepen¬ 
dently  of  the  shape  of  the  rest  of  the  nonlinearity.  Zero  mean  square 
error  in  the  tail  region  would  indicate  that  the  tail  is  "locally  optimum  in 
that  region”,  and  therefore  provides  the  best  possible  contribution  to 
overall  performance. 

One  area  of  interest  still  open  is  the  question  of  approximating  the 
small  sample  (Neyman-Pearson)  detector.  It  would  be  worthwhile  investi¬ 
gating  the  properties  of  minimum  MSE  approximations  to  the  NP  detec¬ 
tor  nonlinearity 
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Detection 
and  Small  Sample 
Performance  Measurement 


Previous  chapters  have  been  concerned  mainly  with  locally  optimum 
(LO)  detection.  As  pointed  out,  LO  detection  may  be  regarded  as  a  limit¬ 
ing  worst  case,  optimal  only  in  an  asymptotic  sense.  For  finite  sample 
sizes  and  nonzero  signal-to-noise  ratios,  Neyman-Pearson  detection  is 
optimal  in  a  particular  sense.  Efficacy  is  a  useful  asymptotic  perfor¬ 
mance  measure,  but  it  does  not  give  much  information  about  the  small 
sample  size  performance  of  a  detector 

This  chapter  will  be  concerned  with  developing  a  performance  meas¬ 
ure  useful  for  comparing  finite  sample  detectors  which  approximate  the 
NP  optimal  detector.  Section  1  reviews  the  theoretical  background  of 
this  problem,  and  develops  the  properties  of  the  proposed  performance 
measure,  and  Section  2  presents  some  examples  applying  the  result. 
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Section  3  provides  a  brief  conclusion  to  the  chapter 


1.  Analysis  of  the  Performance  Index  Properties 


Introduction  and  Theoretical  Preliminaries 


Consider  the  binary  hypothesis  testing  problem: 


Hq.  x~/0(x) 


*=(*! . ZrJeX71 


(6.1) 


Hi’  *~f  i(x) 


A  straightforward  apphcation  of  the  Neyman-Pearson  Lemma  [l,  p.  193] 
leads  to  a  threshold  test  of  the  form 


(6.2) 


This  test  is  optimal  in  the  sense  that  for  any  probability  of  false  alarm 
a  <  a0  of  incorrectly  deciding  H1  when  H0  is  true,  the  probability  /3  of 
correctly  deciding  H1  when  Hx  is  true  is  greater  than  any  other  test  with 
level  a  <  a0.  Often,  /S  is  called  the  power  of  the  test.  Alternatively,  the 
measure  1-/3  is  sometimes  of  interest,  and  is  designated  as  the  probabil¬ 
ity  of  false  dismissal. 

As  noted  in  Chapter  2,  the  statistics  a0  and  /S  are  difficult  to  com¬ 
pute.  However,  one  approach  to  describing  the  performance  of  the  test 
(6.2)  is  to  find  bounds  on  a0  and  1— /3  based  upon  measures  of  distance 
between  /  q  and  /  lt  such  as  the  Chernoff  distance  [2],  Kailath  [3]  provides 
a  summary  of  classical  approaches,  and  Blahut  [4]  explores  distance 
measures  and  some  connections  between  hypothesis  testing  and  coding. 
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An  extensive  example  of  Chernoff  bounding  is  available  in  Van  Trees  [7,  pp 
116-133], 

These  techniques  are  useful  when  f0  and  f  x  are  known  exactly,  but 
unfortunately,  this  is  not  often  the  case.  Furthermore,  by  force  or 
choice,  the  likelihood  ratio  test  (6.2)  may  be  altered  by  replacing  f0  and 
f1  with  incorrect  densities  jd0  and  p  v  Kazakos  [5,6]  considers  the  use  of 
distance-measure-like  bounding  techniques  for  hypothesis  tests  based  on 
inaccurate  versions  of  the  true  densities. 

The  contribution  of  this  chapter  is  to  extend  some  results  on  dis¬ 
tance  bounding  and  bounding  for  detection  under  mismatch  to  the  more 
general  situation  where  the  likelihood  ratio  is  replaced  by  a  general 
transformation  not  necessarily  defined  by  the  ratio  of  two  unique  densi¬ 
ties.  It  will  be  useful  to  make  the  transformations 

XtfF(x)  =  lnAyp(x)  (6.3) 

t  =  In  T  (6.4) 

and  consider  the  Neyman-Pearson  test 

Hi 

Aatp(x)  >  t  (6.5) 

H0 

The  log-likelihood  ratio  X^p(x)  will  be  replaced  by  a  general  detection 
processor  ^(x).  The  following  regularity  conditions  are  assumed  with 
respect  to  both  the  measures  induced  by  density  functions  /  o  and  /  a: 

(a)  — °°  < g  (x)  < 00  a.e.  inX71 
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(&)  /o(x)  ^/^x)  for  some  subset  of  X71  with  nonzero  measure. 

Thus,  distinctness  of  the  hypotheses  is  assured.  Additionally, 
it  is  required  that  the  measures  induced  by  f0  and  f }  both 
be  absolutely  continuous  with  respect  to  each  other.  This 
implies  that  f  0  and  /  j  have  common  support,  and  that  the 
detection  problem  is  not  singular 

(c)  — 00  <  E0<7  <  Eigr  <°°  Therefore,  distinctness  of  the 

hypotheses  after  processing  by  the  detector  is  assured.  This 
mild  condition  merely  restricts  the  processor  g  to  be  "rea¬ 
sonable":  observations  under  FIj  tend  to  generate  a  larger 
valued  test  statistic  than  observations  under  Ho- 

The  regularity  conditions  ensure  that  p(x)  exists  w.p.l  under  either  H0  or 
Hj  It  will  be  assumed  that  these  regularity  conditions  are  satisfied  by 
all  detectors  and  densities  considered  in  the  remainder  of  this  chapter. 
Using  the  generalized  detection  processor  g,  the  likelihood  ratio  test 
(6.2)  becomes 

H, 

$(*)  <  t  (6.6) 

H0 

As  an  aside,  note  that  g  (x)  has  several  common  realizations.  For 
instance,  it  may  be  the  output  of  a  matched  filter,  or  its  approximation. 

n 

In  other  cases,  j(x)  =  ^i(ii),  where  p*  is  a  memoryless  nonlinear 

i  =  l 

transformation.  When  the  observations  |”=]  are  independent 

n 

Atfp(x)  =  n  ANP,i  (xi) 

i  =  1 


(6.7) 
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ties  of  observation  x*.  It  follows  from  the  monotonicity  of  the  logarithm 
function  that  (6.6)  is  an  NP  optimal  test  when  gi(xi)  =  XNp.i(xi),  the  log- 
likelihood  ratio.  Memoryless  transformations  other  than  the  log- 
likelihood  ratio  may  be  used  as  git  and  they  may  be  generated  by 
methods  similar  to  those  proposed  in  the  previous  chapters,  particularly 
when  the  noise  density  is  assumed  to  be  stationary. 

Exposition  of  the  Performance  Index 

For  the  remainder  of  this  chapter,  we  consider  the  the  binary 
hypothesis  test  of  (6.1),  assume  a  decision  will  be  made  according  to  a 
test  (6  6),  where  the  regularity  conditions  are  satisfied  and  y(x)  is  not 
necessarily  equal  to  A^p(x).  As  a  first  step  in  developing  a  performance 
index  for  the  test  (6.6),  consider  the  functionals  given  by 
Definition  1. 


(6.0) 


Notice  that  both  M0  and  M j  are  cumulant  generating  functions,  since 
they  are  the  natural  logarithms  of  the  moment  generating  function  (mgf) 
for  the  random  variables  produced  by  the  transformation  fif(x)  or  —g(x), 
respectively.  Thus,  necessary  and  sufficient  conditions  for  M  to  exist  and 
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be  finite  is  that,  for  u  in  some  neighborhood  about  the  origin,  the  mgf  of 
g  exists.  A  necessary  condition  for  finite  M  to  exist  is  simply  that  g  (x) 
has  finite  moments  of  all  order. 

The  following  theorem  provides  bounds  on  the  error  probabilities. 
Theorem  1.  Let  a  test  of  the  form  (6.6)  be  used  to  distin¬ 
guish  between  two  hypotheses  of  the  form  (6.1),  and  assume 
Mq  and  Mi  are  defined  as  above.  If  M0  and  Mi  exist  and  are 
finite,  then 


a0  s 

(6.10) 

1-yS-s 

(6.11) 

Proof,  (after  [5]) 

a0  =  Prob|G(x)  >  T  |  H0 

-  Prob^effW  >  |  Hq 


-  Prob 


e  «;(*)-<  >  l 


=SE r 


g-ulgUffO) 


by  the  Markov  inequality.  The  proof  for  the  inequality  on  1-/5 
follows  in  similar  manner.  ■ 


The  two  functionals  may  be  combined  to  provide  a  useful  perfor¬ 
mance  measure  for  comparing  false  dismissal  error  probabilities  of  two 
competing  detector  structures  operating  with  equal  false  alarm  rates. 
Before  this  is  illustrated,  it  is  first  necessary  to  develop 
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Lemma  1.  If  M0  and  M  j  exist,  define  the  function 
T{u'>9)  =  MQ{u\g)+Mx{u\g).  Then 


(i) 


r*{g )  =  r(u*-g ) 


(6.12) 


=  min 

U 


M0(u-,g)  +  M1(u;g) 


exists  for  some  finite  u* 

(ii)  The  minimum  value  satisfies  r  (u  *\g)  <  0. 

(Hi)  For  any  g  ,  the  value  of  u*  is  unique. 

Proof,  t 

(i)  Since  Mq  and  Mx  are  cumulant  generating  functions,  they  are 
convex  in  u,  [8,  p.  121];  therefore  r  is  also  convex.  Observe 
that  r (0;(7 )  =  0;  it  may  be  shown  that  lim  r  (u,g)  =  =». 

First,  we  rewrite  the  definition  of  r(u\g)  as 

r(u;gr)  =  In  e~uC  J  eu^+c^f  Q  +  lneuC f  e~u^3+c')f  x 

where  C  is  some  constant.  The  region  of  integration  may  be 
partitioned,  giving 


r(u;g)  =  — itC+uC+ln 


(ff  +  c)>0 


+  /  e“(j+c,/o 

{g  +C)^0 


t  In  this  proof,  we  employ  a.e.  as  the  abbreviation  of  almost  everywhere  f  «  g 
means  that  the  measure  induced  by  f  is  absolutely  continuous  with  respect  to 
the  measure  induced  by  g .  If  /  «g  and  g  «f  ,  then  the  induced  measures 
are  equivalent ,  and  the  condition  is  denoted  as  f  =  g .  For  convenience,  the 
phrase  v/ith  respect  to  the  measure  induced  by  will  be  suppressed  in  the  text. 
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+  In 


f  e~u<3+C)f 

(g+cy*0 


+  f  e  ~U(9  +  C)f 

(s+cxo 


We  now  consider  separately  the  case  u  >  0  It  follows  that 


r{u,g)  >  In 

/  eui3+C)fo 

+  In 

f  e“ufr+c)/, 

(s  +  C)>  0 

(g+C)<0 

as  each  partitioned  integral  is  nonnegative.  Regularity  con¬ 
dition  (c)  implies  that  g  cannot  be  a  constant  a.e.  with 
respect  to  f  q  or  /j.  Therefore,  for  some  C,  the  regularity 
condition  that  fo  =  f  i  ensures  that  there  exists  e>0  such 
that 


0  * 


/  fo 

g+C>c 


and 


0?£  /  / 1 

g +C<-e 

Because  c> 0  exists, 


r  (u  ;g )  >  In 

/  «“7o 

+  In 

/  euVi 

g  +  C>  e 

>  2iic+ln 


f  f  o 

fe+‘C)>C 


+  In 


/  f 


(ff  f?) 


The  latter  function  grows  without  bound  as  u  approaches 
infinity. 

For  the  case  u  <0,  similar  arguments  show  that  r(u\g) 
grows  without  bound  as  u  approaches  negative  infinity  also; 
therefore,  since  r(u;y)  is  convex,  some  finite  u *  exists  that 
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minimizes  r(u;g). 

{ii)  Since  r  is  convex  in  u,  to  show  u*  >  0  and  r(u*;g )  <  0,  it  will 

<  0.  Here, 


fly 

be  sufficient  to  demonstrate  that  - — 

OIL 


u=0 


m0'(u)  m^u) 
m0(u)  mx(u) 


where  the  m{u)  are  moment  generating  functions.  Thus, 
m(0)  =  1,  and  m'(0)  =  E(p),  which  gives 


-  £0(0) +  ■^i(_9r) 

u=0 

Regularity  condition  (c)  ensures  that  this  quantity  is  nega¬ 
tive.  Therefore,  the  minimum  value  of  r{u\g)  exists  for  some 
u*>0,  and  this  minimum  value  r*(g )  is  less  than  zero. 

(iii)  To  demonstrate  that  u*  is  unique,  it  will  be  sufficient  to  show 
that  the  second  partial  derivative  of  r(u;g)  with  respect  to  u 
is  strictly  positive  for  all  u  and  arbitrary  g.  The  second  par¬ 
tial  derivative  may  be  written  as 


dr 

du 


d2r 

du2 

\ 


feU3f  of  9 

zeU3fo- 

2 

2 

+ 


[f 90 

2 

1 

L/W.F 

2 

Notice  that  the  functions 
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and 


are  density  functions  also.  The  expectations  with  respect  to 
these  two  new  densities  will  be  denoted  by  Eoe  and  EJ# , 
respectively.  The  second  derivative  may  be  expressed  in 
terms  of  these  expectations  as 


Eoe£2-EoVs)  +  (Ei,02-E?e0 


which  is  the  sum  of  the  variance  of  g  under  f0e  and  /l6, 
respectively.  The  regularity  conditions  ensure  that  g  is  finite 
a.e.  and  not  a.e.  a  constant;  therefore  / =  / 0,  and  /  le  =f 
Then  g  is  not  a.e.  a  constant  with  respect  to  /  and  /  lg ,  and 
its  respective  variances  are  nonzero.  Thus,  when  r(u,g) 


d^T 

exists,  — =-  is  strictly  positive.  ■ 
dr l 


The  previous  lemma  demonstrates  that  for  a  given  g,  it  is  possible  to 
find  the  minimum  value  r*(g),  which  shall  be  designated  as  a  perfor¬ 
mance  index  of  g .  The  reason  for  this  will  be  clear  from 


Theorem  2.  If  M0  and  exist,  then  1-/S<  for 


u  >0,  and  there  exists  a  tightest  bound 


1  -S<  —  er*k) 
«o 


(6.13) 
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Proof. 

a0  ^  exp(— ■ ut  +M0) 

In  a0  ^  —ut  +M0 

t  ^o“ln  go 
u 

We  substitute  this  result  into  the  bound  on  false  dismissal 
probability: 

1— jS  <  exp(u£  +Mi) 

<  exp(^0— In  a0+i/i) 

_  1  cUg+Ul 

a0 

Lemma  1  guarantees  that  r*(g)  exists  for  a  unique  value 
u*>0.  Thus,  it  follows  that  a  tightest  bound  er*^  exists. 


If  r*(g)  is  to  be  a  useful  performance  index  for  comparing  detectors, 
it  must  give  the  best  index  for  the  optimal  detector  structure.  Demons¬ 
tration  of  this  fact  will  require 

Lemma  2.  Suppose  G  is  a  convex  set  of  functions  on  X71  satis¬ 
fying  regularity  conditions  (a)  and  (c).  Suppose  M0  and  Mx 
exist  for  all  g  e  G.  Then  Mo  and  M\  are  convex  on  G. 

Proof.  To  demonstrate  convexity  of  the  two  functionals  it  will 
be  sufficient  to  show  that 


M0(u;6g  +[l-<5]/i)  ss  6M0(u,g)  +  [l-<5]jtf0(u;h) 
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f  or  0  <,  6  <  1  and  g  ,h  e  G. 

We  begin  by  recalling  Holder's  Inequality: 


if  — +  —  =  1,  then  E\XY\  < 
p  q 


E I 


i /p 


Ely’ 


1/g 


The  mequality  is  apphed  to  the  definition  of  M0,  with  p  =  ■— 

o 


and  q  = 


1 


1-6 


Af0(u;6y+[l-6]/i)  =  Inf  0(x)dx 


<  In 


fe  6f  o(x)dx 


fe  f  o(x)dx 


1— <5 


=  61n  /eu’W/0(x) dx  +  (l-d)ln  /euA«/0(x)dx 


=  6i/0(u;p)  +  [l-<5]itf0(u;/i) 


The  proof  for  Mx  is  identical  in  form.  ■ 

Theorem  3.  Let  G  be  the  set  of  all  functions  on  Xn  satisfying 
(a)  and  (c).  Then  the  function  r(u;g )  achieves  a  globally 
minimum  value  for  g  (x)  =  XNP(x)  and  u  =  J£. 

Proof.  First,  note  that  if  g  (x)  e  G,  then  Cg  (x)  e  G  for  any  con¬ 
stant  OO.  Therefore,  minimizing  r(u\g)  over  R+xG  is 
equivalent  to  minimizing  r(^;g)  over  G.  To  prove  the 
theorem,  it  will  be  sufficient  to  fix  u  =  Yi  and  attend  to  the 
minimization  problem  in  G. 

To  prove  existence  of  a  stationary  point  at  g  (x)  =  X//p(x) , 
a  calculus  of  variations  argument  will  be  used. 
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Lei  d(x)  be  any  arbitrary  variation  which  is  not  a.e.  a  con¬ 
stant,  and  let  e  be  a  real  number;  further,  let  <5(x)  be  subject 
to  the  restriction  that  the  perturbed  nonlinearity 

g{x)  =  XffP{x)+c6{x) 

remains  an  element  of  G.  If  c5(x)  is  a.e.  a  constant,  then 
r(]fc\Np)=r(%,\Np+£6). 

The  functional  r(u-,\Np+c6)  may  be  written 

r(u;9)=ln/e“tx»'w«',wl/o(x)<ii 

+  In  /e~“[x"w*rfwl/l(x)di 

For  the  remainder  of  the  proof,  the  dependence  on  x  will  be 
suppressed  in  the  notation.  Taking  the  first  derivative  with 
respect  to  epsilon  yields 


aZL_  u /< $eu[Xw,*cg]/0  u 

d£  j’ eul*NP+e&] j  ^  j~e  -v[^NP+C&]j  ^ 


A  necessary  condition  for  a  stationary  point  in  r  to  exist  at 

=  0  for  all  possible  variations  (5.  There- 


A.vp  is  that 

de 


t  =  o 


fore,  the  condition  for  a  minimum  is 


0  = 


/6euXw70  _  jftSe^^/i 


feuXNPf  o  fe~uXNPf1 


But  \np  =  In  7-^-,  and  u  =  %>  and  the  necessary  condition 
Jo 


becomes 
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/(/<>/,)*  /(/,/ o)*4 

which  is  fulfilled  for  any  arbitrary  variation  <5.  Therefore,  the 
conclusion  is  that 

XNP+cd) 

It  is  easy  to  show  that  G  is  convex;  hence,  Lemma  2 
implies  that  this  stationary  point  is  a  global  minimum  [13,  p. 

191]. 

As  an  aside,  note  that  the  global  minimum  value  is 
achieved  for  any  pair  of  u  and  g  such  that  ug  =%\NP+C 
almost  everywhere  for  any  constant  C.  Thus,  the  globally 

minimum  value  t(]^,Xnp)  is  not  unique.  ■ 

In  a  binary  hypothesis  test,  the  performance  of  the  test  is  unaffected 
by  a  monotone  transformation  of  the  test  statistic.  Here,  the  weaker  pro¬ 
perty  of  the  invariance  of  r*(g )  to  linear  transformations  of  the  test  is 
demonstrated. 

Proposition  1.  r*(g )  =  r*(a+bg),  where  the  variables  a  and 
6  are  real  numbers,  and  6^0. 

Proof. 

r(u,g)  =  M0(u;a+bg)+M1(u-,a+bg )  (6.14) 


=  In  /e“I°+b»W)/0(x)dx  +  In  /e-u(°+6?W)/1(x)dx 
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=  Ineuo/eut’*«/o(x)dx  +  In  e_ua/ e^^/^dx 
=  ua-ua+ln  J'eVJ9^f  0(x)dx  +  In  fe~'W9^f  j(x)dx 

=  (6.15) 

Finally,  minimization  of  (6.14)  with  respect  to  u  obviously 
yields  the  same  result  as  minimization  of  (6.15)  with  respect 
to  w.  Therefore  r*(g )  is  invariant  under  linear  transforma¬ 
tions  on  g  ■ 

2.  Application  of  the  Performance  Index 

The  previous  section  proposed  the  performance  index  r*(g )  and 
developed  some  of  its  properties  under  very  loose  regularity  conditions 
on  the  two  hypothetical  densities  f0  and  f  h  as  well  as  on  the  detection 
processor  g.  The  index  is  usable  for  dependent  as  well  as  independent 
noise,  and  for  linear  or  nonlinear  processors,  with  or  without  memory. 

The  iid  Noise  Case 

The  properties  of  the  index  will  be  explored  here  for  the  case  of 
independent  and  identically  distributed  observations  where  g  is  the  sum¬ 
mation  of  outputs  of  a  memoryless  nonlinear  transformation. 

Proposition  2.  Let  the  noise  densities  of  hypothesis  test  (6.1) 

n 

be  /(.)(x)=  H/ (•);*(*»).  and  the  detection  processor  p(x) 

1=1 

71 

be  of  the  form  g(x)  =  £  9i(xi)-  Then 

i=i 

M0(u  ,g)  =  2  In  fe  us,i(xi)/  0;i  (zi)dzi 
1  =  1 


(6.16) 
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Mt(u;  g)  =  ^l\nfe~U9iiXi)f1.i{xi)dxl  (6.17) 

i  =  i 

Proof.  The  proof  is  a  straightforward  computation,  outlined 
here  for  M0  as 

M0(u,g)  =  ln/e^«/0(x)dx 


9i(xi) 


n/0;i(Xi)dXi-dxn 


1  =  1 


=  lnfl 

1=1 


=  Eln  /e“Jt(li)/o;t(zt)^ 

i  =  l 

The  proof  for  M i  follows  similarly.  ■ 

71 

When  the  noise  observations  are  iid,  then  /  (x)  =  ~[\f  (xj.  Here,  the  dis- 

i  =  l 

tinction  between  the  multivariate  and  univariate  densities  should  be 
clear  from  the  arguments  of  the  densities. 

Corollary  1.  When  the  noise  is  independent  and  identically 

n 

distributed,  and  g  (x)  =  £  <7,(2^),  then 

t=i 

M0  =  nM0(u,gi)  =  n  In f  eU9l^x)f  Q(x)dx 
Mi  =  nM i(u  \g  j)  =  n  Infe  ~ug^x') f 


The  performance  index  becomes  r*(g)  =nr*(g  J  where 


r*(gi)  -  min 

U 


MQ{u\gi)+Mi{u 


(6.18) 
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After  inserting  this  result  into  (6.13),  the  bound  on  false  dismissal  proba¬ 
bility  becomes 

(6.19) 

a0 


Thus,  the  bound  on  this  error  decreases  exponentially  with  the  number  of 
data  observations. 


Two  detectors,  g  and  h,  may  be  compared  by  computing  their  rela¬ 
tive  efficiency,  where  RE_  h  =  ,  the  ratio  of  the  number  of 

9A  riJa  0,/S) 


observations  in  the  respective  detectors  operating  with  false  alarm  rate 
no  greater  than  a0  and  probability  of  correct  detection  at  least  /3.  While 
r*(g)  does  not  allow  computation  of  the  exact  value  of  /?,  it  does  allow 
computation  of  a  bound  on  1-/3. 

Proposition  3.  Suppose  two  memoryless  detector  nonlineari¬ 
ties  g  and  h,  operating  on  iid  distributed  observations  each 
use  ng  and  nh  data  observations,  respectively,  and 


Tin 


]) 

r*(/ii) 


.  Then 


_  1  c7iftr 

oto  <x0 


Proof.  The  proof  follows  from  direct  computation." 
r  *{g  j) 

The  quantity  — may  be  designated  as  the  relative  bound  efficiency 
r*(hi) 

of  detector  g  relative  to  detector  h.  Thus 


RBE 


g,h 


l) 

r*(/ij) 


(6.20) 
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is  a  measure  of  the  relative  rates  of  convergence  in  the  false  dismissal 
probability  of  two  detectors  operating  with  equal  false  alarm  rates.  Alter¬ 
natively,  it  may  be  considered  as  a  measure  of  the  ratio  of  the  number  of 
samples  needed  in  each  detector  to  obtain  equal  bounds  on  the  false 
dismissal  probability  for  equal  false  alarm  rates.  Note  that  (6.20)  extends 
easily  by  replacing  r*{gt)  with  r*(g).  Thus,  the  RBE  of  two  detectors  may 
be  compared  for  non-iid  noises,  and  detectors  with  memory. 

A  related  measure  of  efficiency  is  the  Chernoff  asymptotic  relative 
efficiency  [2,11],  or  ARE*  h,  defined  as 


min 

minAf0('li;y ).  minAf  i(u;o) 

,  u  u 

min 

rminAf0(u;/i),  minAfi(ii;/i)' 

i  U  U 

The  proposed  measure  RBE  differs  from  ARE*  in  that  RBE  measures  the 
relative  rates  of  convergence  of  1— /S  under  equal  false  alarm  rates  for  the 
two  detectors. 


Detection  of  a  Known  Constant  Signal 

An  often  discussed  special  case  is  the  problem  of  detecting  the  pres¬ 
ence  or  absence  of  a  known  constant  signal  in  the  presence  of  an  additive 
iid  noise.  When  the  signal  is  positive,  this  problem  is  sometimes  is  known 
as  the  shift-to-the^right  problem.  The  univariate  noise  densities  under 
the  respective  hypotheses  become 

fo(x )  =  /(*) 

/i(x)  =  f(x-es) 

where  6  is  the  known  constant  signal  amplitude,  and  f  is  the  univariate 
density  of  the  additive  noise.  For  convenience,  and  without  loss  of 
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generality,  we  shall  hereafter  assume  s  =  1.  A  common  situation  is  for 
the  noise  to  have  zero  mean,  and  for  the  density  to  be  symmetric  about 
the  mean.  The  optimal  detector  will  then  be  odd-symmetric  about  the 
point  9/2.  Under  these  conditions,  we  have 

Proposition  4.  If 


fi(x)  =  fo(x-e) 
fo(x)  =fo(-x) 


(6,21) 

(6,22) 

(6.23) 


then  M1(u\g)  =  M0(u;g). 

Proof.  The  proof  begins  with  the  definition 


M^u-g)-  In 


Applying  (6.21)  yields 

» 

M^u-.g)  -  In  f  e^9^9^/  0(x)dx 


=  In  fe-^^/z+^+9/^f0(x)dx 
and  applying  (6.23)  gives 


M^u.g)  ~  ln-/eUff(zVo(-a:)ciE 


Finally  application  of  (6.22)  yields  the  desired  result 


Mi  (u;g)  =  M0(u;g) 
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It  is  possible  to  show  in  the  shift-to-the-right  problem  that,  as  the 
signal  vanishes,  the  quantity  RBE^  ^  for  two  detectors  approaches 
asymptotically  the  value  of  ARE-  h  . 

9  i”' 

Theorem  4.  In  the  shift-to-the-right  problem,  with  iid  noise 
and  detectors  g(x)=  2p(zi;e>)  and  /i(x)=  f>(zi;e)  that  are 

i=l  i=i 

odd-symmetric  about  9/2,  with  test  structure  (6.6),  and  with 
test  thresholds  E0g<tg<Exg  and  Eohs^^Ej/i,  respec¬ 
tively,  let  the  false  alarm  rate  be  equal  in  both  tests.  Then 


lim  RBE  h  =  ARE_  h 
9+0  3  a  gA 


Proof.  The  power  of  the  test  using  g  is 

Pg  =  Prob|p  (x)>7ii?  |  Hi 
=  Prob^i(x)  <  ntg  |  H0 

and  similarly  for  ph.  By  application  of  Chernoffs  theorem  [8, 
11]  it  may  be  shown  after  some  simple  algebra  that 


hm  In  Prob^p  (x)>nfy  |  Hj 
=  min  +  Mx(u\-g) 


and  that 


(6  24) 


lim  —  lnProb[p(x)<7T4  |  H0 

71  71  L  * 

utg  +M0(u-,-g) 


(6.25) 


=  mm 

U 


independently  of  the  value  of  6.  By  Proposition  4, 
g)  =  Mx(u\-g).  Therefore 
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min 

u 


-utg  +  M1(u;-g) 


=  J£min 

U 


M0(u;-g  -tg)  +  M1(u;-g+tg) 


(6.26) 


and 

min  | utg  +  M0(u;-g)  j  (6.27) 

=  &min  \M0(u;-g+tg)  + M^u.-g-tg) 

However,  (6.24)  and  (6.25)  imply  that  (6.26)  and  (6.27)  must 
be  equal.  As  a  consequence  of  Proposition  1,  they  are  equal 
to  fyr*{g). 

Following  Capon  [9],  let  be  a  sequence  of  signals 

such  that  lim©*  =0,  and  let  the  sequences  [rig^l  and  \nhk\ 

be  two  increasing  sequences  of  integers  such  that 


0  *  lim  (3g  ( 9k  ,ng-k ) =  Jitn  ,nhk ) 

k  -+°°  k  -*  °° 


*  l 


(6.28) 


Since  the  nonlinearities  g  and  h  are  functions  of  9,  we  will 
denote  the  sequences  of  nonlinearities  dependent  on  [6k\  as 
gk  and  hk,  respectively.  Then 


lim  In Pg{9k>ng\k)  =  lim  lnProb[yfc(x)>ni  |  Hj] 

k  -*■ 00  k 

and  similarly  for  detector  hk.  The  ratio  of  false  dismissal 
probabilities  for  the  two  detectors  may  be  written  as 


nh.k  \n§g{ek,TLg]k)  _  nh-,k  In. P rob 

g(x)>ng.ktg  j  Hj 

TTy,k  ln/S h{9k'nh;k)  rig  k  InProb 

g(x)^nh.kth  |  H! 

By  previous  arguments,  it  follows  that  in  the  limit  the  right 
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side  of  (6.29)  becomes  the  ratio 

lim^ 

*■»“  T*(hi fc) 


(6.30) 


which  by  the  definition  (6  20)  of  RBE  is  the  quantity 
,hk  ■  Condition  (6.28)  assures  that  in  the  limit  the 


powers  of  detectors  gk  and  hk  are  equal,  which  reduces  the 
left  side  of  (6.29)  to  the  definition  of  asymptotic  relative 
efficiency 


lim  AREff  h 

The  conclusion  then  is  that  as  e-»0,  the  quantities 
RBE0  h  and  ARE^,/i  are  asymptotically  equivalent.  Note  that 
in  the  limit,  nonlinearities  gk  and  hk  are  odd-symmetric 
about  the  origin.  ■ 


Numerical  Examples 

In  this  section,  the  performance  index  is  calculated  and  compared 
for  three  different  detector  structures  in  two  different  noise  environ¬ 
ments  for  the  shift-to-the-right  problem.  The  objective  is  to  decide 
between 

Ho :  ~  /  (Xi) 

for  i  =  l,  ...  ,n  and  0>O 

Hi;  xi  ~ 


Sff(xi) 


i=l 


Hi 

> 

H0 


t 


using  a  test 
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The  three  detector  nonlinearities  which  will  be  examined  are  the  linear 
detector 


9u(x)  =  9(x-6/  2) 


(631) 


the  sign  detector 


9sd(x)  =  sgn(x-0/2) 


(6.32) 


and  the  amplifier  limiter 


9al(x) 


—Q'J 2  for  -°°  <  x  <  0 

2V2(x-0/2)  for  0<x  <  0 
9%/ 2  for  0< x  <  °° 


(6.33) 


The  two  densities  which  will  be  used  are  the  Gaussian  density 

foW  =  vsr"'''2  (6  34) 


and  the  Laplace  density 

/!<*)  =  (6.35) 

The  three  detectors  are  illustrated  in  Figures  6.1  -  6.3.  Note  that  y^(x)  is 
the  Neyman-Pearson  optimal  nonlinearity  for  /  =/g,  and  gsd  is  the  NP 
optimal  nonlinearity  for  /  -Ji- 

The  methods  of  this  chapter  may  be  applied  to  calculate  r*(g),  and 
the  RBE  of  various  pairs  of  detectors  under  the  two  noise  environments. 
Appendix  6. 1  gives  the  formulation  of  r(u  ,g ;/ )  for  all  six  combinations  of 
detector  nonlinearities  and  densities.  Here,  /  appears  as  an  argument  of 
r  to  emphasize  the  dependence  of  r*(g ;/)  on  a  single  univariate  density. 
For  some  combinations  of  nonlinearities  and  densities,  r*  or  u*  is  given, 
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2 

1 

9a l(x)  0 

-1 

-2 

"4  "2  o  2  4 

x 

6.2.  The  amplifier  limiter  detector  nonlinearity  g ^  for 
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Plg.  6.3.  The  sign  detector  nonlinearity  gsd  for  0  =  1. 


Pig.  6.4.  Performance  comparison  of  the  amplifier  limiter 
and  the  sign  detector  relative  to  the  linear  detector  in  Gaus¬ 
sian  noise. 
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but  for  others  this  value  must  be  found  through  numerical  methods. 

Figures  6  4  and  6.5  present  the  RBE  of  the  detector  pairs  under  Gaus¬ 
sian  and  Laplace  noise  assumptions,  respectively.  Since  both  densities 
were  defined  with  unit  variance,  the  horizontal  axis  of  the  plots  is  also  a 
measure  of  the  signaL-to-noise-ratio  (SNR).  The  nonlinearities  are 
parameterized  as  a  function  of  9,  thus  as  9  becomes  small  the  shape  of 
gal  and  gsd  become  nearly  identical  relative  to  a  fixed  observation  scale. 

As  predicted  in  Theorem  4,  RBE^  ld  and  RBEsd  ld  asymptotically 
approach  AREsd  ^d  for  small  9.  When  the  SNR,  (equivalently,  0),  becomes 
large,  RBE^  ld  approaches  unity  for  both  densities,  implying  that  under 
this  condition  the  amplifier  limiter  and  the  linear  detector  have  the  same 
efficiency  Also,  RBEsd  ld  converges  to  )£,  as  shown  in  Appendix  6.2. 

For  comparison,  the  ARE  of  various  detector  pairs  may  also  be  calcu¬ 
lated  as  a  function  of  the  parameter  9.  For  the  purpose  of  calculating 
efficacy,  it  is  assumed  that  the  nonlinearities  are  symmetric  about  zero 
instead  of  9/2.  Therefore,  in  Appendix  6.1  the  efficacy  is  given  for  the 
shifted  nonlinearities  g(x  +  9/2)  Figures  6.6  -  6.11  compare  RBE  and 
ARE  for  pairs  of  detectors  under  the  different  noise  assumptions. 

All  six  of  the  figures  further  emphasize  the  convergence  of  ARE^  id, 
RBEali 

,ld'  and  RBEsd,ld  for  small  9.  The  performance  of  the  amplifier 
limiter  and  the  Unear  detector  are  approximately  equivalent  for  high 
SNR,  as  shown  by  both  ARE  and  RBE.  Notice  that,  while  ARE  predicts  a 
constant  performance  level  for  the  sign  detector,  Figures  6.8  -  6.11 
emphasize  that  the  linear  detector  or  amplifier  limiter  may  well  outper- 
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20iog  10e 

Fig.  6.5.  Performance  comparison  of  the  amplifier  limiter 
and  the  sign  detector  relative  to  the  linear  detector  in 
Laplace  noise. 


Fig.  6.6.  Comparison  of  ARE  and  RBE  of  the  amplifier  limiter 
relative  to  the  linear  detector  in  Gaussian  noise. 


-191- 


201og10  6 


Fig.  6.7.  Comparison  of  ARE  and  RBE  of  the  amplifier  limiter 
relative  to  the  linear  detector  in  Laplace  noise. 


Fig.  6.8.  Comparison  of  ARE  and  RBE  of  the  sign  detector 
relative  to  the  linear  detector  in  Gaussian  noise. 
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Fig.  6.9.  Comparison  of  ARE  and  RBE  of  the  sign  detector 
relative  to  the  linear  detector  in  Laplace  noise 
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Fig.  6.10.  Comparison  of  ARE  and  RBE  of  the  amplifier  lim¬ 
iter  relative  to  the  sign  detector  in  Gaussian  noise. 
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Fig.  6.11.  Comparison  of  ARE  and  RBE  of  the  amplifier  lim¬ 
iter  relative  to  the  sign  detector  in  Laplace  noise. 
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form  the  sign  detector  for  moderate  to  high  SNR.  Michaisky,  Wise  and 
Poor  [12  studied  the  convergence  of  relative  efficiency  to  asymptotic 
relative  efficiency  in  the  finite  sample  size  detector  and  observed  similar 
difficulties  with  ARE.  They  also  found  that  in  certain  cases  relative 
efficiency  may  produce  a  different  ranking  of  detector  performance  than 
would  asymptotic  relative  efficiency 

3.  Conclusion 

Some  properties  of  a  functional  r*(g )  were  developed  in  this  chapter, 
and  it  was  shown  that  r*{g )  is  a  performance  measure  which  may  be 
potentially  useful  for  studying  the  performance  of  finite  sample  size 
detectors.  In  this  regard,  it  may  be  considered  as  a  figure  of  merit,  or  a 
performance  index  for  a  detector.  As  was  demonstrated,  r*(g )  is  a  quan¬ 
tity  which  may  be  used  to  form  an  exponential  bound  on  1-/3.  Thus,  the 
smaller  the  value  r*(g),  the  smaller  the  bound  on  false  dismissal  proba¬ 
bility.  Given  a  pair  of  hypotheses,  r*(g)  may  be  used  to  rank  competing 
alternative  structures. 

A  disadvantage  of  bounding  methods  is  that  it  is  not  clear  that  com¬ 
paring  and  ordering  systems  by  a  performance  bound  corresponds 
exactly  to  an  ordering  of  the  systems  by  their  error  probabilities. 
Indeed,  we  resort  to  a  bounding  method  precisely  because  we  are  unable 
to  calculate,  (and  hence,  order  by)  the  error  probabilities  of  the  alterna¬ 
tive  systems.  A  bound  is  useful  for  comparing  the  relative  merits  of  sys¬ 
tems,  though,  for  a  bound  guarantees  a  certain  minimum  performance 
level.  As  a  result,  it  is  reasonable  to  say  that  the  tightest  bound 
corresponds  in  some  sense  to  the  "best"  system. 
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Of  particular  utility  is  the  ratio  of  respective  r*(g )  indices  for  two 
competing  detectors.  This  ratio  was  denoted  as  relative  bound  efficiency, 
or  RBE,  as  was  shown  to  be  asymptotically  equivalent  to  ARE  as  the  signal 
to  noise  ratio  vanishes.  For  finite  SNR,  however,  the  behavior  of  RBE 
diverges  from  ARE  and  follows  more  closely  intuition  about  the  relative 
efficiency  of  several  common  finite  sample  size  detectors. 


-196- 


Appendix  6. 1 

In  the  previously  stated  conditions  of  the  shift-to-the-right  problem 
r(u;g)  =  2 Mq.  Therefore,  for  the  six  combinations  of  nonlinearities  and 
noise  densities,  an  expression  for  M0  will  be  given,  and  when  a  simple 
form  can  be  found,  an  expression  for  r*(u,g).  For  comparison,  the 
expression  for  efficacy  of  the  shifted  (zero-centered)  nonlinearities  are 
also  given.  The  cumulative  distribution  of  the  Gaussian  density  is  written 
here  as  $(x). 

1.  Gaussian  density.  Linear  detector 


T  0*  .gid'J  c)  =u{u-\)ez 

(A6.1) 

r*(9id'J  g)  = 

(A6.2) 

/nc(9id(x+e'/  2;0))  =  l 

(A6.3) 

Gaussian  density,  sign  detector 

r(u>9sd'JG )  =  2 In  <2-u$(e/2)+eu$(-@/2) 

(A6.4) 

r*(9sd'Jc )  =  21n2+ln$(e/2)+ln$(-e/  2) 

(A6.5) 

r)G(9sd(x+e/  2;©))  = 

(A6.6) 

3.  Gaussian  density,  amplifier  limiter 


riuig^Jo)  =  (A6.7) 

2|<i>(0—  2V2u.)— 1 $(—  2V2u)  exp  — (V2u0— 4u2) 
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77g(^(x  +  ^/2);0))  = 


1— 4>(— <9/2) 


V^TT 


e2  3_ 
4  ’  2 


fA6.8) 


Here,  f  is  the  incomplete  gamma  function 


r(x,y)  =  J~  e_TT5/  1 dr 


4.  Laplace  density,  linear  detector 


r(u-9id<fL )  ~  2in2-it02  — 21n(2— u2#2) 


(A6.9) 


,  _  o  —  l+V  l  +  02/2 


u*  =  2 


6>‘ 


(A6.10) 


hL(<7w(x+0/2;e))  =  1 


(A6  11) 


5.  Laplace  density,  sign  detector 


r(v--,gsdlfL)  =  2  In 


_-eV2/2 

e-u+ e - ( eu-e~u ) 


(A6.12) 


r*(gSd,fL)=  ln2-^-+ln 


1-- 


,-V2e/2 


(A6.13) 


hz,(Ssci(x  +  S/2,e))  =  2 


(A6.14) 


6.  Laplace  density,  amplifier  limiter 


’■('“iSa I'Jl)  ~ 


21n 


©V 2 


,  -u0V^+g  (u-l)eV2  ,_e _ (2u-l)©V2  N 

V2(l-2u)  V  ' 


(A6.15) 


-ln2 
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r*(9<d  Jl)  -  2  In 


,  -0V2 


(2+0) 


-2  In  2 


(A6.16) 


VL(9al(x+e/  2 ;©)) 


|_e-V292/2 
g  -V29/2 


1- 


2 


(02+0V2+2) 


(A6.20) 
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Appendix  6.2 

Here  it  is  demonstrated  that  RBES^  ^  approaches  asymptotically  the 
value  Yi  for  the  shift-to-the-right  problem  and  iid  observations.  To  begin, 
let  p  =F(G/2),  where  F  is  the  cumulative  distribution  function  of  the 
noise  density.  Then 


Mo(u,gsd)  =  In  e~up+eu(l—p) 


(A6.18) 


and  it  is  easy  to  show  that  u  *  —  ^ln  — ^ — .  Using  this  value  of  u * 

1-p 


ri*(u*’9sd)  =  lnp+ln(l— p)+21n2 


(A6.19) 


for,  as  a  consequence  of  Proposition  4,  r  j  =  2 M0. 

It  may  be  shown  [10]  through  a  saddlepoint  expansion  approach  that 


ln(l-p)  =  M0{u*,gld)  +  e(  In (9“^) 


(A6.20) 


where  £  represents  the  approximation  error,  of  order  ln£?~^.  Using 
(A6.19)  and  (A6  20)  the  ratio  RBEsddd  may  be  written  as 


ri  _  lnp+ln(l— p)+21n2 

ri  2 ln(  1  —p )  +  2£ ( 0 -^) 


(A6.21) 


Finally,  after  noting  that  limp  =  1  and  applying  L’Hapital’s  Rule  to 

Q  -*oa 


(A6.21),  we  conclude  that 


lim  RBEsdi!d  =  H 


(A6.22) 


-200- 


References 

[1]  P.J.  Bickel  and  K.A.  Doksum,  Mathematical  Statistics:  Basic  Ideas 
and  Selected  Topics,  Holden-Day,  Inc:  San  Francisco,  1977. 

[2]  H.  Chernoff,  "A  Measure  of  Asymptotic  Efficiency  for  Tests  of  a 
Hypothesis  Based  on  the  Sum  of  Observations",  Ann.  Math.  Stat.,  vol. 
23,  pp.  493-507,  1952. 

[3]  T.  Kailath,  "The  Divergence  and  Bhattacharyya  Distance  Measures  in 
Signal  Selection",  IEEE  Trans.  Common.  Tech,  vol.  COM-15,  no.  1, 
pp.  52-60,  Feb.  1967. 

[4]  R.E.  Blahut,  "Hypothesis  Testing  and  Information  Theory",  IEEE 
Trans.  Inform.  Theory,  vol.  IT-20,  no.  4,  pp.  405-417,  July  1974. 

[5]  D.  Kazakos,  "Signal  Detection  under  Mismatch",  IEEE  Trans.  Inform. 
Theory,  vol.  IT-28,  no.  4,  pp  681-684,  July  1982. 

[6]  D.  Kazakos,  "Statistical  Discrimination  using  Inaccurate  Models", 
IEEE  Trans.  Inform.  Theory,  vol.  IT-28,  no.  5,  Part  I,  pp.  720-728, 
Sept.  1982. 

[7]  H.L.  Van  Trees,. Detection,  Estimation,  and  Modulation  Theory,  Part 
I,  John  Wiley  and  Sons:  New  York,  1971. 

[8]  P.  Billingsley,  Probability  and  Measure,  John  Wiley  and  Sons:  New 
York,  1979. 

[9]  J  Capon,  "On  the  Asymptotic  Relative  Efficiency  of  Locally  Optimum 
Detectors",  IRE  Trans.  Inform.  Theory,  vol.  IT-7,  pp.  67-71,  April 
1961. 

[10] F.W.J.  Olver,  Asymptotics  and  Special  Functions,  Academic  Press: 
New  York,  1974. 

[11]  R. J.  Serfling,  Approximation  Theorems  of  Mathematical  Statistics, 
John  Wiley  and  Sons:  New  York,  1980 

[12] D.L.  Michalsky,  G.L.  Wise,  and  H.V.  Poor,  "A  Relative  Efficiency  Study 
of  some  Popular  Detectors",  J.  of  Franklin  Inst.,  vol.  313,  pp.  135- 
148,  March  1982. 

[  13]  D.G.  Luenberger,  Optimization  by  Vector  Space  Methods,  John  Wiley 
and  Sons,  Inc.:  New  York,  1969 


7 


Conclusion 


In  this  chapter,  the  main  contributions  of  the  dissertation  are 
reviewed,  and  some  suggestions  for  extending  this  research  are  made. 

1.  Review  and  Suggestions 


Chapter  2 

Detection  procedures  and  noise  models  were  highlighted  in  Chapter 
2,  and  the  failings  of  the  classical  Gaussian  noise  assumption  were  noted 
As  an  alternative  model,  a  definition  of  non-Gaussian  noise  density  was 
given,  and  several  commonly  used  non-Gaussian  noise  models  were  exhi¬ 
bited. 

The  critical  feature  of  non-Gaussian  noise  as  defined  here  is  the  fact 
that  the  density  is  much  heavier  tailed  than  the  Gaussian  density.  In  this 
type  of  detection  environment  it  is  important  to  reduce  the  influence  of 
the  very  large  observations;  even  just  a  few  impulses,  or  outliers,  can 
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seriously  disturb  detector  performance. 

Finally,  in  light  of  the  work  in  robust  statistics,  and  the  work  in 
optimal  detection  in  non-Gaussian  noise,  it  was  proposed  to  examine  sim¬ 
ple  adaptive  detectors  which  are  useful  when  only  a  very  loose  character¬ 
ization  of  the  noise  statistics  is  available. 

At  the  end  of  the  chapter,  some  Arctic  under-ice  noise  data  was  dis¬ 
cussed.  It  would  be  interesting  to  study  its  characteristics  further,  par¬ 
ticularly  examining  its  distribution  and  dependency  structure.  Does  the 
data  fit  any  commonly  used  models? 

Chapter  3 

The  following  conjecture  was  proposed  and  exploited  successfully: 
Suppose  some  generic  detector  nonlineari ty  with  a  roughly  linear  region 
near  the  origin  is  chosen  that  allows  freedom  in  selection  of  the  non¬ 
linearity  tail  behavior.  Then,  it  should  be  possible  to  make  measure¬ 
ments  on  the  observed  noise  and  adjust  the  nonlinearity  tails  appropri¬ 
ately. 

Two  alternatives  techniques  were  proposed  and  studied:  the  tail 
matching  method,  giving  gtm,  and  the  efficacy  maximizing  procedure, 
leading  to  the  piecewise  linear  processor  gZi .  In  the  examples,  both 
methods  were  able  to  achieve  high  levels  of  performance  relative  to  the 
optimal  structure,  even  though  both  nonlinearities  were  ad  hoc  proposals 
and  only  very  simple  measurements  of  the  noise  density  were  used  to 
drive  adaptation.  When  simulated  with  the  physical  noise  data,  both 
detectors  appeared  to  have  improved  performance  relative  to  the  linear 
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detector. 

The  conclusion  to  be  drawn  from  the  chapter  is  that,  when  choosing 
the  form  for  a  nonlinearity,  it  is  not  critical  to  find  the  exactly  optimal 
structure  Rather,  it  is  possible  to  achieve  nearly  optimal  results  using 
quite  simple  structures,  provided  that  there  is  enough  freedom  to  adapt 
the  structure  to  the  particular  noise  density  of  interest.  As  noted  in 
Chapter  2,  the  specification  of  a  particular  class  of  non-Gaussian  densi¬ 
ties  leads  to  generic  specification  of  the  class  of  suitable  approximate 
nonlinearities. 

It  would  be  worthwhile  to  investigate  further  certain  properties  of 
suboptimal  detectors.  For  instance,  the  performance  of  a  suboptimal 
nonlinearity  is  sometimes  less  sensitive  to  changes  in. the  noise  environ¬ 
ment  than  the  optimal  nonlinearity.  What  causes  this  property,  and  how 
may  it  best  be  employed9  Can  other  methods  besides  Huber's  min-max 
approach  produce  robust  detector  nonlinearities9 

Chapter  4 

When  a  nominal  background  noise  is  contaminated  by  bursts  of 
impulsive  noise,  it  was  shown  that  it  is  possible  to  design  a  structure 
which  recognizes  the  bursts,  and  then  uses  this  information  to  adapt  the 
detector  rapidly.  The  structure  was  developed  in  two  parts:  one  part  was 
a  time  varying  detector  which  switched  between  two  nonlinearities,  and 
the  other  part  was  a  nonparametric  noise  burst  detector  utilizing  a 
median  filter.  *j 

Under  one  reasonable  and  realistic  set  of  assumptions,  it  was  demon- 


-204- 


strated  that  the  switched  burst  detector  can  outperform  any  fixed  detec¬ 
tor  structure. 

One  problem  mentioned  in  the  chapter  and  worthy  of  further  atten¬ 
tion  is  analysis  of  the  switched  burst  detector  algorithm  when  a  statisti¬ 
cal  model  for  the  noise  burst  lengths  is  available.  Also,  given  a  statistical 
description  of  the  burst  run  lengths,  how  may  the  nonparametric  burst 
detector  be  improved'?  Probably,  this  knowledge  would  lead  to  a  thres¬ 
hold  test  where  the  threshold  varied  as  a  function  of  the  number  of 
observations  since  the  last  state  transition  was  encountered. 

Another  important  area  to  be  investigated  is  the  use  of  alternatives 
to  linear  detectors  during  the  impulsive  noise  modes.  Would  any  perfor¬ 
mance  advantage  due  to  the  use  of  robust  nonlinearities  outweigh  the 
loss  of  simplicity  when  the  low  gain  linear  alternative  is  replaced"? 

Chapter  5 

In  Chapter  5  the  equivalence  between  efficacy  maximizing  pro¬ 
cedures  and  minimum  mean  square  approximation  of  the  true  locally 
optimum  nonlinearity  is  demonstrated  In  particular,  the  results  lend 
substance  to  some  loose  ideas  about  what  constitutes  a  "good"  approxi¬ 
mation.  it  is  important  to  match  the  optimal  nonlinearity  closely  in  the 
regions  where  an  observation  is  highly  probable,  while  a  rough  approxi¬ 
mation  is  sufficient  in  low  probability  regions  such  as  the  density  tails. 
Moreover,  once  an  approximation  is  fairly  "close"  to  the  true  nonlinearity, 
further  refinements  lead  to  little  performance  improvement.  This  is  not 
to  say  that  any  nonlinearity  tail  behavior  will  suffice;  the  mean  square 
error  between  a  linear  processor  and  blanking  nonlinearity  tails  can  be 
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great,  despite  weighting  by  a  relatively  small  probability  mass,  and  it  is 
known  that  the  linear  processor  performs  poorly  in  heavy-tailed  noises. 

This  chapter  provides  a  distance  measure  between  nonlinearities.  Is 
it  possible  to  find  a  min-max  robust  suboptimal  nonlinearity  using  this 
tool?  Another  useful  extension  of  this  work  might  include  examination  of 
nonlinearity  approximation  procedures  in  the  dependent  noise  case. 

Chapter  6 

A  performance  index  r*,  useful  for  comparing  detectors  operating 
with  equal  false  alarm  rates,  was  developed  in  Chapter  6.  It  was  shown 
that  the  ratio  of  performance  indices  for  two  detectors  is  a  useful  indica¬ 
tor  of  their  relative  performance  under  non-zero  signal  to  noise  ratios. 
Further,  this  ratio,  the  proposed  measure  of  relative  bound  efficiency, 
approaches  the  measure  of  asymptotic  relative  efficiency  as  the  signal 
vanishes. 

It  would  be  worthwhile  to  examine  r*  and  relative  bound  efficiency 
further.  For  instance,  it  would  be  interesting  to  examine  their  use  in 
dependent  noise.  There  are  other  open  points:  how  does  relative  bound 
efficiency  compare  to  relative  efficiency9  How  tight  is  the  bound  on  per¬ 
formance  using  r*9  Is  it  possible  to  find  r*  directly  and  circumvent  the 
proposed  minimization  procedure9 

2.  Conclusion 

The  underlying  goal  of  this  study  was  to  consider  the  signal  detection 
problem  in  the  case  of  incomplete  knowledge  of  the  non-Gaussian  noise 
environment.  In  striving  towards  this  goal,  work  was  presented  ranging 
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from  simulations  using  physical  noise  data  to  theoretical  analysis. 

The  theoretical  results  of  this  thesis  may  be  useful  tools  in  the  con¬ 
tinued  study  of  nearly  optimal  detectors.  The  proposals  for  detector 
structures  presented  here  are  not  definitive;  however,  they  do  confirm 
some  ideas  about  useful  approaches  to  this  problem,  and  point  out  possi¬ 
ble  directions  for  further  research. 
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